Abstract
Genetic programming (GP) automatically designs programs. Evolutionary programming (EP) is a realvalued global optimisation method. EP uses a probability distribution as a mutation operator, such as Gaussian, Cauchy, or Lévy distribution. This study proposes a hyperheuristic approach that employs GP to automatically design different mutation operators for EP. At each generation, the EP algorithm can adaptively explore the search space according to historical information. The experimental results demonstrate that the EP with adaptive mutation operators, designed by the proposed hyperheuristics, exhibits improved performance over other EP versions (both manually and automatically designed). Many researchers in evolutionary computation advocate adaptive search operators (which do adapt over time) over nonadaptive operators (which do not alter over time). The core motive of this study is that we can automatically design adaptive mutation operators that outperform automatically designed nonadaptive mutation operators.
Introduction
Genetic programming (GP) [1] is a branch of evolutionary computation that can generate computer programs, and is widely applied in numerous fields [2,3,4,5,6,7,8,9,10]. Evolutionary programming (EP) is a blackbox optimiser and mutation is the only operator in EP. Researchers recommended different probability distributions as mutation operators and analyse their characteristics. For example, Yao et al. [11] point out that a Cauchy mutation performs better than a Gaussian mutation by virtue of a higher probability of making large jumps, while large step sizes are typically detrimental towards the end of the search process when the set of current search points are close to the global optimum. Hong et al. [12] mentioned that when using a nonadaptive mutation operator in EP, more offspring usually survive in the early generations, and conversely less survive in the later generations of the run. Researchers also proposed different mutation strategies to promote EP efficiency [12,13,14,15,16,17]. ‘A hyperheuristic is a search method or learning mechanism for selecting or generating heuristics to solve computational search problems’ [18]. Researchers classify hyperheuristics according to the feedback sources in the learning process: Online learning hyperheuristics learn from a single instance of a problem; Offline learning hyperheuristics learn from a set of training instances and generalise to unseen instances [18]. Both online [2,3,4,5] and offline hyperheuristics [6,7,8,9,10] are applied to various research fields.
Hyperheuristics are an effective and popular technique which have been applied to a wide range of problem domains. Cowling et al. [2] used online learning hyperheuristics to minimise the number of delegates who actually attend the sales summit out of a number of possible delegate attendees. Dowsland et al. [3] used online learning hyperheuristics to handle the design and evaluation of a heuristic solution to the problem of selecting a set of shippers that minimises the total annual volume of space required to accommodate a given set of products with known annual shipment quantities. Ochoa et al. [4] used online learning hyperheuristics to describe the number of extensions to the HyFlex framework that enables the implementation of more robust and effective adaptive search heuristics. Pisinger et al. [5] used online learning hyperheuristics to present a unified heuristic which is able to solve five different variants of the vehicle routing problem. Shao et al. [6] used multiobjective genetic programming as an offline learning hyperheuristic to apply the feature learning for image classification. Hong et al. [7, 8] used GP as an offline learning hyperheuristic to automatically design a mutation operator for EP and automatically design more general mutation operators for EP [9]. Ross et al. [10] used an offline learning hyperheuristic to represent a step towards a new method of using evolutionary algorithms that may solve some problems of acceptability for realworld use.
In this study, the heuristics are adaptive mutation operators generated by GP based on an offline hyperheuristic. In other words, we use a GPbased offline hyperheuristic to redesign a portion of the EP algorithm (i.e. the probability distribution) to improve the overall EP algorithm performance. The contribution of this study is that this work realises the ‘automatic’ design of ‘adaptive’ mutation operators, it realises both ‘automatic’ and ‘adaptive’ at the same time has been revised and updated. Previous studies either automatically designed static/nonadaptive mutation operators [7,8,9], or human/manually designed adaptive mutation operators/mutation strategies for EP [13, 16, 17, 19,20,21,22]. To achieve this target, a set of adaptive factors was proposed, which collected and updated historical information during the evolutionary process. The proposed method also contributes to a group of automatically designed adaptive mutation operators for function classes. In essence, these adaptive factors are variables and are provided in terminal sets for GP that uses these variables to automatically design a mutation operator (random number generator), which replaces the human designed mutation operator in the EP algorithm. In EP, each individual is taken as a pair of realvalued vectors. The variables can partly reflect the individuals’ current position and the evolutionary status. These variables change during evolution, and affect the characteristics of mutation, thus we call them adaptive factors. For example, \(CUR\_MIN\_X\) is the minimum value of the best individuals up to the current generation. \(N(\mu , CUR\_MIN\_X)\) is a mutation operator, \(CUR\_MIN\_X\) is updated in each EP generation, then the size of the jumps the mutation operator can make keeps updating in each EP generation. \(CUR\_MIN\_X\) is an adaptive factor type. The adaptive factors are collected and employed for EP in both the training and testing stages.
The hypothesis of this study, inspired by [8, 9, 11, 12] is that for a specific class of functions GP can design an adaptive mutation operator for EP, which outperforms an automatically designed nonadaptive mutation operator. The adaptive factors collected through the EP run can lead to the size of the jumps being different. A set of adaptive mutation operators can be discovered for function classes respectively. For more details please refer to Sect. 4.
The outline of this paper is as follows: in Sect. 2, we describe function optimisation and the basic EP algorithm. In Sect. 3, we describe the adaptive mutation operator. Section 4 describes connections between GP and EP. Section 5 describes the benchmark functions and their classes, and how a mutation operator is trained, and also describes the test results. Here, we contrast the performance of previously proposed automatically designed nonadaptive mutation operators (ADMs, i.e. not adaptive) with automatically designed adaptive mutation operators (ADAMs, i.e. are adaptive). In Sect. 6, we analyse and compare the testing results and in Sect. 7, we summarise and conclude the paper.
The basic EP algorithm
The EP algorithm evolves a population of numerical vectors to find nearoptimum functional values. Mutation is the only operator for EP, and recently EP researchers have focussed primarily on manually designing mutation operators or smart strategies to use the mutation operators [11,12,13,14,15,16, 23].
Minimisation can be formalised as a pair (S, f), where \(S \in {\mathbb {R}}^{n}\) is a bounded set on \( {\mathbb {R}}^{n}\), and \(f:S\longrightarrow {\mathbb {R}}\) is an ndimensional realvalued function. S is the problem search space (function f). The aim of EP is to find a point \(x_{\min } \in S\) such that \(f(x_{\min }\)) is a global minimum on S or a close approximation. More specifically, the requirement is to find an \(x_{\min }\in S\) such that
f does not need to be continuous or differentiable, but it must be bounded (i.e. S is bounded). The EP’s mutation process is represented by the following equations:
In the above equations, each individual is taken as a pair of realvalued vectors, (\(x_{i}\), \(\eta _{i}\)), \(\forall i\in \{1, \cdot \cdot \cdot , \mu \}\), \(\mu \) represents number of individuals in the population, i is the dimensionality of f, and j represents the \(jth\) component of the vectors \(x_{i}\), \(x_{i}^{\prime }\), \(\eta _{i}\), and \(\eta _{i}^{\prime }\), the factors \(\gamma \) and \(\gamma \) \(^{\prime }\) are set to \((\sqrt{2\sqrt{n}})^{1}\) and \((\sqrt{2n})^{1}\). \(D_{j}\) represents the mutation operator; researchers usually use a Cauchy, Gaussian, or Lévy distribution \(L_{\alpha , \gamma }(y)\) as the mutation operator [11, 14, 23]. Lee et al. [14] point out that the Lévy distribution with \(\alpha \) = 1.0 is the Cauchy distribution, and that with \(\alpha \) = 2.0, it is the Gaussian distribution. For a complete EP description, refer to [24].
In this study, the hyperheuristic framework designs an adaptive mutation operator (can also be seen as an adaptive mutation strategy) and replaces a probability distribution \(D_{j}\). The EP algorithm uses this candidate mutation operator on functions generated from the function classes.
Adaptive mutation operator
This study focuses on automatically designing adaptive heuristics (i.e. random number generators which are used as mutation operators in EP). However, the human designed adaptive heuristics have already been proposed in numerous studies, below are examples of human designed heuristics that relate to adaptiveness, adaptive mutation operator, and mutation strategies:
The concept of adaptiveness is widely used in many technologies: in [25] they proposed new local search procedures combined in an adaptive large neighbourhood search metaheuristic to generate solutions for the gate matrix layout problem. An adaptive charged system search algorithm was developed to solve economic dispatch problems in [26]. To make the terminal phase of the standard global positioning system and inertial navigation system (GPS/INS) landing system more precise, an adaptive fuzzy data fusion algorithm was developed yielding more accurate state estimates while the vehicle approaches the landing surface [27].
A novel adaptive strategy is developed to dynamically tune the control parameters of the random learning operator so that the improved adaptive human learning optimisation can efficiently accelerate the convergence at the beginning of an iteration, develop diversity in the middle of the searching process to better explore the solution space, and perform an accurate local search at the end of the search to find the optima [28].
A hybrid adaptive evolutionary algorithm was introduced to improve the performance of the search operators across the various stages of the search/optimisation process of evolutionary algorithms [29]. A neighbourhoodadaptive differential evolution method was also proposed; in this framework, multiple neighbourhood relationships are defined for each individual and the neighbourhood relationships are then adaptively selected for specific functions during the evolutionary process [30].
An adaptive switching particle swarm optimisation algorithm using a hybrid update sequence is proposed, which can automatically switch to synchronous or asynchronous updating during the evolutionary process [31]. [32] presented a method for reusing the valuable information available from previous individuals to guide later search; in this approach, prior useful information was fed back to the updating process.
The concept of adaptive mutation operators has been proposed in numerous studies. The adaptive mutation operator proposed for particle swarm optimisation uses the three mutation operators (Cauchy, Gaussian, and Lévy) in [33]; the mutation operators that cause lower fitness values for the offspring see their selection ratios decreased, and the selection ratios for mutation operators that cause higher fitness values for the offspring increase. In [34], they developed an adaptive mutation for a particle swarm optimisation for an airfoil aerodynamic design. In [35] they proposed using an adaptive strategy in differential evolution, with a Cauchy distribution (\(F_{m}\), 0.1), where they use a fixed scale parameter 0.1 and an adaptive location parameter \(F_{m}\). The Gaussian (\(C_{\mathrm{{rm}}}\), 0.1) has a fixed standard deviation of 0.1 and a mean of \(C_{\mathrm{{rm}}}\).
Liu et al. [21] investigated operator adaptation in EP both at the population and individual levels. The operator adaptation at the population level aims to update the rates of the operator based on the operator performance over the entire population during the evolution [21].
In [17], a mixed mutation strategy with a local fitness landscape was proposed: In these strategies, a local fitness landscape was used as a key factor to determine the mutational behaviour. In [16], ensemble strategies with adaptive EP was proposed. In this work, EP with an ensemble of Gaussian and Cauchy mutation operators was proposed where each mutation operator has its own population and parameters. In [12], mutation operators with different parameters were selected in each EP generation according to the step size of the jumps by individuals. In this strategy, the size of the jumps the mutation operator can make keeps changing during the EP evolutionary process. These studies can be considered as human designed adaptive mutation strategies. Liu [21] proposed that operator adaptation in EP can be investigated at both the population and individual levels. In [12], we introduced a mutation strategy for EP, generating long step size variants at the beginning and short step size variants later on in the search.
Regardless of the type of adaptive or nonadaptive mutation operators used in stateoftheart algorithms, the essence of the adaptive mutation operator is the following:

Use different mutation operators, or a combination of them, for each generation.

Change the jump sizes for different generations according to the feedback from the current generation or historical information (i.e. each EP generation has a best fitness value, the best fitness values of each generation can be stored in an array, the array fitness values are an example of historical information).
The proposed hyperheuristic GP algorithm
In this section, we describe how GP is used to build EP mutation operators. In previous work, the GP framework successfully designed nonadaptive mutation operators for EP [7, 9]. In [7, 9], Hong et al. used GP as an offline learning hyperheuristic to automatically design a mutation operator for EP and automatically design more general mutation operators for EP. In [7], a group of mutation operators were designed for function classes. In [9], function classes were classified into several groups, and each group is assigned an independent automatically designed mutation operator. In both works, the automatically designed mutation operator is fixed at each EP generation. The automatically designed mutation operator is a probability distribution. Once the probability distribution is automatically designed, the form of the equation with the values are fixed when an EP is used; thus, the search region is fixed at each EP generation.
However, it is obvious that dynamically updating the probability distribution during the evolutionary process can lead to a more efficient search. In this study, we employ the improved framework in Fig. 1 by more creative GP terminal settings, while using GP as an offline hyperheuristic to automatically design adaptive mutation operators for EP on specific function classes. In the framework, GP sits at the hyperlevel to generate a piece of code (which acts as a mutation operator by generating random numbers according to a probability distribution), and secondly, generate functions from the function classes to optimise. The automatically generated program is actually an automatically designed probability distribution with adaptive factors. For example, \(CUR\_MIN\_X\) represents the absolute minimum value of the best individual in an EP run up to the current generation, which is updated at each EP generation; thus, the jump sizes of the ADAM change dynamically at different EP generations. The settings in Table 1 are adaptive factors that exist in the mutation operator. At the baselevel, EP optimises functions and EP is treated as a fitness function by GP in the framework. The fitness value used in GP is the value calculated, averaged over nine runs by EP.
In Fig. 1, the blue arrow between ‘Function class’ and ‘Function to optimise’ represents functions are generated from the function class. The blue arrow between ‘Function to optimise’ and ‘Evolutionary programming’ represents functions that are taken by EP and optimised. The blue arrow between ‘Evolutionary programming’ and ‘Adaptive mutation operator generator’ represents the insertion of an ADAM which is inserted into EP. The adaptive mutation operator is generated by GP and inserted into EP where it is tested on a set of functions.
In the experiments, we perform comparisons of the two types of automatically designed mutation operators: A nonadaptive mutation operator (ADM), which is a random number generator according to a fixed probability distribution, and an adaptive mutation operator (ADAM), which is a random number generator according to a probability distribution that dynamically changes during the EP run. This means that the mutation operator could be different at each EP generation. To identify the hypothesis proposed in this study, in contrast with the framework we proposed in [7, 9], we propose two significant improvements:

The adaptive factors, are proposed and recalculated at each EP generation, added to the terminal set for GP.

In [7,8,9], the GP framework uses EP as a fitness function, which is used to evaluate the performance of GP individuals, through nonadaptive mutation operators. In this work, the adaptive factor values are used as part of the adaptive mutation operator for EP. Due to the adaptive factors added, EP needs to calculate and update values of adaptive factors at each generation. The updated values of adaptive factors lead to the probability distribution of the possible step sizes to which a mutation operator can make changes.
Experimental design
We call a random number generator produced by GP without adaptive factors an automatically designed mutation operator (ADM) [7], and a random number generator produced by GP with adaptive factors an automatically designed mutation operator (ADAM). The experiment is designed to test the following hypothesis: ADAM can search different regions at different EP generations. The ADAM achieves better performance than an ADM for EP on specific function classes. Thus, the adaptive factors are calculated and used by EP when evolution is being processed. In the experiment, 168 functions are generated from each function class. Eighteen functions are used in the training stage, 50 functions are used in the testing stage, 100 functions are used in the exchange testing (50 for ADAM and 50 for ADM).
The training stage
The algorithm proposed in [7, 9] demonstrates automatically designed nonadaptive mutation operators have better performance than human designed mutation operators. In these experiments, we train mutation operators for each function class with two different terminal sets. In one experiment, the GP terminal set contains adaptive factors (see Table 1) that are generated by EP. In the other experiment, the terminal set of GP excludes adaptive factors (see Table 4). Both training processes use the same set of functions generated from function classes. In the experiments, ADAM denotes adaptive factors with automatically designed mutation operators. The tailored ADM automatically designed for the function class \(F_{g}\) is represented with \(\hbox {ADM}_{{g}}\), where g is the function class index. \(\hbox {ADM}_{{g}}\) is called a dedicated ADM for the function class \(F_{g}\). The tailored ADAM automatically designed for a function class \(F_{g}\) is represented by \(\hbox {ADAM}_{{g}}\), where g is the function class index. \(\hbox {ADAM}_{{g}}\) is called a dedicated ADAM for function class \(F_{g}\).
In the experiments, where a subtree crossover is applied, nodes are randomly selected from both parent trees, and the related branches are exchanged creating two offspring. A onepoint mutation with the grow initialisation method is applied in GP to generate a new offspring [1, 38]. During training, the GP fitness value is derived from the averaged fitness values of the nine EP runs. Each \(\hbox {ADM}_{{g}}\) or \(\hbox {ADAM}_{{g}}\) is used as an EP mutation operator on nine functions drawn from a given function class. The fitness value of an \(\hbox {ADM}_{{g}}\) or \(\hbox {ADAM}_{{g}}\) is the average of the best values obtained in each of the individual nine EP runs on the nine functions. We use the same nine functions from each function class for the entire GP run on a given function class. In general, for one function class, 18 functions are taken for training, nine of which are used to calculate the fitness value, and nine others to monitor overfitting.
To ensure that the experimental data is more traceable and easier to compare, the function classes and number of generations for each function class used in this study follow the settings in study [9]. The EP parameters follow [11, 14, 23], and are presented in Table 2: the population size is set to 100, the tournament size is set to 10, the initial standard deviation is set to 3.0. The settings for the dimension n, and the number of domains S are listed in Table 6. To reduce the cost of training, in our experiment the maximum number of EP generations was set to 1000 for \(F_{1}\)–\(F_{13}\) and \(F_{15}\). The maximum number of EP generations was set to 100 for \(F_{14}\) and \(F_{16}\)–\(F_{23}\).
The terminal set includes the adaptive factors outlined in Table 1, and excluding the adaptive factors outlined in Table 4. In both tables, N is a normal distribution for which the value of \(\mu \) is a random number in \([2, 2]\). This value may cause the designed mutation operator to not be Yaxis symmetric. \(\sigma ^2\) is a random number in [0, 5]. U is the uniform distribution in the range [0, 3]. In Table 1, GEN is the current EP generation index: It is an integer in the range [1, 1000] or [1, 100], which depends on the function classes we selected. \(CUR\_MIN\_Y\) is the best fitness value of the individuals up to the current generation, and the framework records the following for a given EP individual (each individual is taken as a pair of realvalued vectors in EP) [23]: \(CUR\_MIN\_X\), as the absolute minimum value for this individual, \(CUR\_MAX\_X\), as the absolute maximum value for this individual, and \(CUR\_STD\_X\) as the standard deviation of this individual. \(CUR\_MAX\_Y\) is the worst fitness value of the individuals in the current generation, and \(CUR\_STD\_Y\) is the standard deviation of all fitness values in the current generation. All these values constitute useful information, and will change during the EP runs.
The GP parameter settings are listed in Table 3. depthnodes is set to 2, indicating that restrictions are to be applied to the tree size (number of nodes) [36]. The GP function sets are listed in Table 5. The other GP settings are: The population size is 20, and the maximum number of generations is 25. The algorithm framework to automatically design ADAM is described in Algorithms 1 and 2.
Testing the \(\hbox {ADAM}_{{g}}\), \(\hbox {ADM}_{{g}}\), and human designed mutation operators
We use ADAMs (in Table 9), ADMs (in Table 10), and human designed EP mutation operators, and test them on each function class \(F_{g}\). For each ADAM or ADM, we record 50 values from 50 independent EP runs, each being the lowest value over all EP generations, and we then average them: This is called the mean best value. The results of testing in Tables 7, 8, and 11 are based on the same 50 functions generated from each function class \(F_{g}\). Thus, in total 68 functions are used for both training and testing stages.
The testing stage
In [11, 13, 17, 23], a suite of 23 benchmark functions are commonly used to test the different EP variants, where \(f_{1}\)–\(f_{7}\) are unimodal functions, \(f_{8}\)–\(f_{13}\) are multimodal functions with many local optima, \(f_{14}\)–\(f_{23}\) are multimodal functions with a few local optima [11].
In this study, we built function classes based on the following function classes [7]. We use function classes instead of single functions for benchmark function optimisation. In other words, we are specially designing an ADAM for a class of functions.
The set of functions can be considered as a set of training instances of offline learning hyperheuristics. The framework generates functions (instances) from the function class to train an ADAM or ADM. We then draw an independent set of functions (unseen instances) from the function class to test the ADAMs and ADMs produced from the training stage.
Based on the 23 functions, we have constructed 23 function classes in Table 6, with the index of each function class corresponding to the original functions of [23]. In Table 6, \(a_{i}\), \(b_{i}\), and \(c_{i}\) are uniformly distributed in range [1, 2], \([1, 1]\), and \([1, 1]\), respectively. The definition of the function class has been proposed in [7, 9]. In this study, we use the symbol \(f_{g}\) for a function, and \(F_{g}\) for a function class.
To observe the performance of the \(\hbox {ADAM}_{{g}}\), \(\hbox {ADM}_{{g}}\) and human designed mutation operators, we tested the functions drawn from on \(F_{g}\). The mean best values and standard deviations are listed in Table 7. The highest mean best values are in boldface. Each mean value is averaged
over 50 runs. To improve the observation, we retain more decimal places for \(F_{6}\), \(F_{16}\), \(F_{17}\), \(F_{18}\), and \(F_{19}\).
We also performed the Wilcoxon signedrank test for \(\hbox {ADAM}_{{g}}\) versus \(\hbox {ADM}_{{g}}\), Lévy (1.0, 1.2, 1.4, 1.6, 1.8, 2.0), the results of which are given in Table 8. The Lévy distribution, where \(\alpha \) = 1.0 is a Cauchy distribution, whereas \(\alpha \) = 2.0 yields a Gaussian distribution [14]. Due to diverse features of different distributions, they have different performance on different benchmark functions.
Exchange testing for \(\hbox {ADAM}_{{g}}\) and \(\hbox {ADM}_{{g}}\) on each function class
The exchange testing evaluates the performance of \(\hbox {ADAM}_{{g}}\) and \(\hbox {ADM}_{{g}}\) on all function classes. An \(\hbox {ADAM}_{{g}}\) designed for the function class \(F_{g}\) is called a tailored adaptive mutation operator, while an \(\hbox {ADAM}_{{g}}\) tested on \(F_{j}\) is called a nontailored adaptive mutation operator; here \(g \not = j\). An \(\hbox {ADM}_{{g}}\) designed for the function class \(F_{g}\) is called a tailored nonadaptive mutation operator, while an \(\hbox {ADM}_{{g}}\) tested on \(F_{j}\) is called a nontailored nonadaptive mutation operator. For example, \(\hbox {ADAM}_{{1}}\) is a tailored adaptive mutation operator for \(F_{1}\), but it is a nontailored adaptive mutation operator for the function class \(F_{2}\). \(\hbox {ADM}_{{1}}\) is a tailored nonadaptive mutation operator for \(F_{1}\), but it is a nontailored nonadaptive mutation operator for the function class \(F_{2}\). To observe the performance of both tailored (or dedicated) and nontailored (or nondedicated) \(\hbox {ADAM}_{{g}}\) and \(\hbox {ADM}_{{g}}\) on \(F_{j}\), we tested \(\hbox {ADAM}_{{g}}\) and \(\hbox {ADM}_{{g}}\) on \(F_{j}\) over 50 runs, and a function is generated from the function class in each run. Fifty functions are used for \(\hbox {ADAM}_{{g}}\), 50 functions are used for \(\hbox {ADM}_{{g}}\). Thus, 100 functions are used in total. The mean best values and standard deviations are given in Tables 11, and 12. We display more decimal places for \(F_{6}\), \(F_{16}\), \(F_{17}\), \(F_{18}\), and \(F_{19}\), as the results are otherwise too close to distinguish. The values of the tailored ADAM and ADM are in boldface, and the values that are better than those of the tailored ADAM and ADM are also in boldface.
Analysis of the performance of \(\hbox {ADAM}_{{g}}\) and \(\hbox {ADM}_{{g}}\)
In this section, we compare \(\hbox {ADAM}_{{g}}\), \(\hbox {ADM}_{{g}}\), and the human designed mutation operators. From the experiment, both \(\hbox {ADAM}_{{g}}\) and \(\hbox {ADM}_{{g}}\) achieve better performance than the human designed mutation operators on \(F_{g}\), and \(\hbox {ADAM}_{{g}}\) achieves an outstanding performance on most of \(F_{g}\).
In Table 7, \(\hbox {ADAM}_{{7}}\), \(\hbox {ADAM}_{{12}}\), \(\hbox {ADAM}_{{19}}\), and \(\hbox {ADAM}_{{20}}\) are the exceptions: the experimental results for \(\hbox {ADM}_{{7}}\), \(\hbox {ADM}_{{12}}\), and \(\hbox {ADM}_{{20}}\) show that they achieve better performance than \(\hbox {ADAM}_{{7}}\), \(\hbox {ADAM}_{{12}}\), and \(\hbox {ADAM}_{{20}}\); The performance of \(\hbox {ADAM}_{{19}}\), and \(\hbox {ADM}_{{19}}\) is the same; In this particular case, the adaptive factors selected may not well match with \(F_{7}\), \(F_{12}\), \(F_{19}\), and \(F_{20}\). In the future we will analyse and collect more adaptive factors for EP, and import the new adaptive factors to design ADAMs for \(F_{7}\), \(F_{12}\), \(F_{19}\), and \(F_{20}\).
Table 8 lists the results of the Wilcoxon SignedRank Test at the 5% significance level, comparing a tailored \(\hbox {ADAM}_{{g}}\) with an \(\hbox {ADM}_{{g}}\) and human designed mutation operators, using Lévy distributions (with \(\alpha \)=1.0, 1.2, 1.4, 1.6, 1.8, 2.0). In this table ‘\(\ge \)’ indicates that the \(\hbox {ADAM}_{{g}}\) performs better on \(F_{g}\) than \(\hbox {ADM}_{{g}}\) or the human designed mutation operators on average. ‘>’ means that the difference is statistically significant, and ‘\(=\)’ means that there is no difference. In the majority of the cases, the \(\hbox {ADAM}_{{g}}\) outperform \(\hbox {ADM}_{{g}}\) and human designed mutation operators. ‘\(\le \)’ indicates that \(\hbox {ADAM}_{{g}}\) performs worse on \(F_{g}\) than \(\hbox {ADM}_{{g}}\).
Table 11 lists the experimental results from using all ADAMs given in Table 9, testing all function classes. ‘N/A’ means \(\hbox {ADAM}_{{g}}\) may lead to generating value out of range of the machine’s representation. For nontailored function classes in certain generations. From this table we find that tailored ADAMs achieve much better performance than nontailored ADAMs on \(F_{j}\) in most cases. The reason is the following: ADAMs can make different jump sizes in different EP generations; however, nontailored ADAMs usually cannot fit in function classes, once an (nontailored) ADAM fails to make an EP search in the entire space in the early generations, it will fall into local optima with no ability to subsequently jump out and escape them. As the values generated by ADAM are probably too short to allow individuals to jump out. In the future we can search for smarter adaptive factors to design ADAMs to avoid falling into local optima earlier, or establish a mechanism to avoid this problem, such as importing long step size mutation operators occasionally in a later EP generation.
The results in Tables 11, and 12 demonstrate that \(\hbox {ADAM}_{{1}}\), \(\hbox {ADAM}_{{3}}\), \(\hbox {ADAM}_{{4}}\), \(\hbox {ADAM}_{{5}}\), \(\hbox {ADAM}_{{6}}\), \(\hbox {ADAM}_{{8}}\), \(\hbox {ADAM}_{{9}}\), \(\hbox {ADAM}_{{10}}\), \(\hbox {ADAM}_{{11}}\), \(\hbox {ADAM}_{{14}}\), \(\hbox {ADAM}_{{16}}\), \(\hbox {ADAM}_{{17}}\), \(\hbox {ADAM}_{{18}}\), \(\hbox {ADAM}_{{19}}\), \(\hbox {ADAM}_{{21}}\), \(\hbox {ADAM}_{{22}}\), and \(\hbox {ADAM}_{{13}}\) achieve the best performance when acting as tailored operators, i.e. acting on their own \(F_{g}\). \(\hbox {ADAM}_{{2}}\), \(\hbox {ADAM}_{{7}}\), \(\hbox {ADAM}_{{12}}\), \(\hbox {ADAM}_{{13}}\), \(\hbox {ADAM}_{{15}}\), and \(\hbox {ADAM}_{{20}}\) achieve satisfactory performance, albeit not the best.
The large variation in a single mutation enables EP to globally search a wider area of the search space [14]. Researchers attempted to design mutation strategies that can generate larger variations, especially when local optima have deep and wide basins, in early EP generations, and smaller variations in the later EP generations. Otherwise, mutation strategies can generate smaller variations in early EP generations and reduced size large variations, still have chances to jump out of local optima in the later evolutionary process if they fall into early generation local optima, in the later EP generations. The ADAM can perform well on functions generated from function classes.
In Figs. 2, 3 and 4, \(\hbox {ADAM}_{{3}}\), \(\hbox {ADAM}_{{4}}\), \(\hbox {ADAM}_{{11}}\), \(\hbox {ADAM}_{{13}}\), \(\hbox {ADAM}_{{21}}\), \(\hbox {ADAM}_{{22}}\), and \(\hbox {ADAM}_{{23}}\) achieve significant performance for EP. \(\hbox {ADAM}_{{2}}\), \(\hbox {ADAM}_{{6}}\), \(\hbox {ADAM}_{{12}}\), and \(\hbox {ADAM}_{{14}}\) do not perform well in the early EP generations. However, \(\hbox {ADAM}_{{2}}\), \(\hbox {ADAM}_{{6}}\), \(\hbox {ADAM}_{{12}}\), and \(\hbox {ADAM}_{{14}}\) lead to an acceleration in the convergence rate in the later EP generations due to the significant effectiveness of the EP adaptive factors. The experiments also demonstrate that these adaptive factors have positive impact, especially in later EP generations.
Let us further analyse the ADAMs. In Fig. 3, \(\hbox {ADAM}_{{12}}\) does not perform well in the early EP generations and shows the worst performance among all mutation operators in early generations (where the generation index is less than approximately 500). This situation changes in the later generations. The adaptive factors in \(\hbox {ADAM}_{{12}}\) does not help EP establish efficient convergence in the early generations. The values in Table 13 show random numbers generated by \(\hbox {ADAM}_{{12}}\) that are large in early EP generations, and much smaller in the later generations. It is obvious that the fit for \(\hbox {ADAM}_{{12}}\) improves in the later EP generations, as the jump sizes the mutation operator can make is significantly more precise due to the variability in mutation operators. We also notice that the processing shapes of \(\hbox {ADM}_{{22}}\) appear as ‘punctuated equilibria’, with \(F_{22}\) as a multimodal function class with a few local optima. This is because the evolutionary process makes no improvement in some generations.
For further discussion, consider two components \(x^{g}\)(j) from an individual, \(g = 1, 2\), as below:
Then, let us use ADAM to generate variants for \(Y_{j}^{k}\); we plot the variables \(x^g\)(1) and \(x^g\)(2) in Figs. 5, 6 and 7 for each ADAM with four different generations: 1, 250, 500, 1000, or alternatively 1, 25, 50, 100. The figures demonstrate that the size of the jumps the mutation operator can make in different EP generations are different. For example, in the early EP generations, the ADAMs search a wider space and can be seen as globally searching the space. In later EP generations, the ADAMs search a very small space. The ADAMs search widely different spaces in the early and later EP generations. This feature leads to the outperformance of ADAMs over the ADMs and human designed mutation operators.
We use ADAMs to generate 3000 random values for different EP generations. The minimum, maximum, and mean values are given in Tables 13 and 14. The values generated by the ADAMs are widely different across generations. In most cases, the minimum and mean values generated by the ADAMs are much smaller for large generation indices, and thus ADAMs search a wider space in the early EP generations, and search smaller space in the later EP generations.
Conclusions and future work
One of the advantages of the proposed method is that it passes the responsibility of the design process, to discover ADAMs for EP, from the human researcher to GP. Different from [7,8,9], the mutation operators discovered by the method proposed in this paper is ‘automatically’ designed and ‘adaptive’. The GP sits at the hyperlevel to generate heuristics as ADAMs for EP on function classes. In other words, it tailors an ADAM to a function class. In this work, we compare two types of automatically designed mutation operators (ADAMs with adaptive factors and ADMs without adaptive factors) as well as human designed ADMs. Adaptive factors may capture the current status and present positions of individuals, or the landscape of the search space. The mutation operators with these factors enable EP to search in a targeted way at different stages. One of the disadvantages of the framework is, due to EP algorithm at the base level, the computational cost needed to evolve both ADAMs and ADMs. However, this is a singular upfront cost, and the tradeoff as to whether this is acceptable depends on the application, required solution quality, and the number of times we are expected to solve problems from the target class of problems. A second disadvantage is the need for a number of training problems. In optimisation, an optimisation algorithm is designed and applied to a single instance of a problem, and that is the entire optimisation process. In our case of automatic design, we are employing a machine learning approach which requires a separate set of training and testing instances. In general, this will require a fair number of training instances, which may not be available.
A large variation in a single mutation enables the EP to globally search a wider region of the search space [14]. Researchers attempted to design mutation strategies that can generate a large variation in the early generations and a small variation in the later EP generations, or small variations in early generations and large variations in the later EP generations. The ADAM can perform well on functions generated from function classes. In Figs. 2, 3 and 4, \(\hbox {ADAM}_{{2}}\), \(\hbox {ADAM}_{{6}}\), \(\hbox {ADAM}_{{13}}\), and \(\hbox {ADAM}_{{23}}\) significantly underperform in the early EP generations, as the ADAMs are not well matched with the functions in those early EP generations.
However, their convergence is faster in the later generations due to the adaptive factors, which demonstrate great effectiveness in EP. The experimental data also demonstrate that the adaptive factors collected are useful, and guide the mutation operator in its search from wider regions to small regions for EP. In the early EP generations, according to Figs. 5, 6 and 7, Table 13 and 14, the behaviour of an ADAM is as follows: (1) ADAM searches a wider space and can be seen as searching the space globally in early generations, and then in the later generations it searches a very small space. (2) Alternatively, it searches a very small space earlier and a wider space later, as do \(\hbox {ADAM}_{{3}}\), \(\hbox {ADAM}_{{4}}\), and \(\hbox {ADAM}_{{22}}\). The first behaviour is expected, however, the second is more interesting. The search from narrow spaces to wider spaces is due to the change of mutation operators. This means even if the search falls into local optima, the ADAM still has the ability to jump out from the local optima.
In summary, ADAMs search regions vary widely between early and later EP generations, and this feature leads to the superior performance of ADAMs over ADMs and human designed mutation operators. The random values generated by the ADAMs vary significantly across generations: In most cases, the minimum and mean values generated by ADAMs are much smaller for large generation numbers, and thus, the ADAMs search wider spaces in the early EP generations, and smaller spaces in later generations.
In this study, we use an offline hyperheuristic to automatically design ADAMs for specific function classes. Previously, researchers have used offline hyperheuristics to tailor ADMs [8] for specific function classes. We conducted tests to evaluate the performance of the tailored ADAMs, tailored ADMs, and mutation operators designed by humans, on specific function classes. The primary conclusions of this study are as follows. First, a tailored ADAM designed by GP can make jumps in the search space of different sizes depending on the EP generation. Secondly, the tailored ADAMs outperformed the tailored ADMs, and the mutation operators designed by humans on most function classes, or achieved the second best performance on a few function classes. Thirdly, an ADAM tailored to functions class performs well on that class of functions when compared to the performance of other ADAMs which were trained on different function classes.
In the future, there are several possible directions that can be investigated. At this point, mutation operators are adaptive, the relationships among adaptive factors are not considered and designed by GP. For example, whether the minimum value and standard deviation exists with certain connections at different generations. It is possible to improve the framework and parameters to further automatically design selfadaptive mutation operators. From the experimental tests, some ADAMs do not demonstrate outstanding performance, which means current adaptive factors may not well fit the function classes, thus more adaptive factors could be incorporated and their effects investigated. Due to the successful application of the work we present in this study, the idea could also be applied to other metaheuristics including particle metropolishastings [39, 40], water wave optimisation [41], differential evolution, and particle swarm optimisation.
References
Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gpfieldguide.org.uk, 2008. (With contributions by J. R. Koza)
Cowling P, Kendall G, Soubeiga E (2000) A hyperheuristic approach to scheduling a sales summit. In: Practice and theory of automated timetabling. In III: third international conference, PATAT 2000. LNCS. Springer, 2000
Dowsland KA, Soubeiga E, Burke E (2007) A simulated annealing based hyperheuristic for determining shipper sizes for storage and transportation. Eur J Oper Res 179(3):759–774
Gabriela O, James W, Matthew H, Tim C (2012) Adaptive evolutionary algorithms and extensions to the hyflex hyperheuristic framework. In parallel problem solving from nature—PPSN XII, volume 7492 of lecture notes in computer science. Springer, Berlin, pp 418–427
Pisinger D, Ropke S (2007) A general heuristic for vehicle routing problems. Comput Oper Res 34(8):2403–2435
Shao L, Liu L, Li X (2014) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netwo Learn Syst 25(7):1359–1371
Libin H, Drake JH, Woodward JR, Ender Ö (2018) A hyperheuristic approach to automated generation of mutation operators for evolutionary programming. Appl Soft Comput 20:162–175
Libin H, John W, Jingpeng L, Ender Ö (2013) Automated design of probability distributions as mutation operators for evolutionary programming using genetic programming. . Genetic programming, volume 7831 of lecture notes in computer science. Springer, Berlin, pp 85–96
Hong L, Drake JH, Woodward JR, Özcan E (2016) Automatically designing more general mutation operators of evolutionary programming for groups of function classes using a hyperheuristic. In: Proceedings of the genetic and evolutionary computation conference 2016, GECCO ’16, New York, NY, USA, pp 725–732, 2016. ACM
Ross P, Schulenburg S, MarínBlázquez JG, Hart E (2002) Hyperheuristics: learning to combine simple heuristics in binpacking problems. In: Proceedings of the genetic and evolutionary computation Conference, GECCO ’02, San Francisco, CA, USA, pp 942–948, 2002. Morgan Kaufmann Publishers Inc
Yao X, Liu Y, Lin G (1999) Evolutionary programming made faster. IEEE Trans Evol Comput 3:82–102
Hong L, Drake JH, Özcan E (2014) A step size based selfadaptive mutation operator for evolutionary programming. In: Proceedings of genetic and evolutionary computation conference 2014. ACM, pp 1381–1388, 2014
Dong H, He J, Huang H, Hou W (2007) Evolutionary programming using a mixed mutation strategy. Inf Sci 177:312–327
Lee CY, Yao X (2004) Evolutionary programming using mutations based on the lévy probability distribution. IEEE Trans Evol Comput 8:1–13
Mallipeddi R, Suganthan PN (2008) Evaluation of novel adaptive evolutionary programming on four constraint handling techniques. In: Proceedings of IEEE world congress on computational intelligence, pp 4045–4052
Mallipeddi R, Mallipeddi S, Suganthan PN (2010) Ensemble strategies with adaptive evolutionary programming. Inf Sci 180(9):1571–1581
Shen , He J (2010) A mixed strategy for evolutionary programming based on local fitness landscape. In: 2010 IEEE congress on evolutionary computation (CEC), pp 1–8, July 2010
Burke EK, Hyde M, Kendall G, Ochoa G, Özcan E, Woodward JR (2010) A classification of hyperheuristic approaches. Handbook of metaheuristics. Springer, New York, pp 449–468
Chellapilla K (1998) Combining mutation operators in evolutionary programming. IEEE Trans Evol Comput 2(3):91–96
Liang KH, Yao X, Liu Y, Newton C, Hoffman D (1998) An experimental investigation of selfadaptation in evolutionary programming, pp 291–300
Liu Y (2007) Operator adaptation in evolutionary programming. In: International conference on advances in computation and intelligence. Springer, Berlin, pp 90–99
Dong H, He J, Huang H, Hou W (2007) Evolutionary programming using a mixed mutation strategy. Inf Sci 177(1):312–327
Yao X, Liu Y (1996) Fast evolutionary programming. In: Proceedings of the 5th annual conference on evolutionary programming. MIT Press, pp 451–460
Bäck T, Schwefel HP (1993) An overview of evolutionary algorithms for parameter optimization. Evol Comput 1:1–23
Vinicius Gandra Martins Santos and Marco Antonio Moreira de Carvalho. Adaptive large neighborhood search applied to the design of electronic circuits. Applied Soft Computing, 73:14–23, 2018
Zakian P, Kaveh A (2018) Economic dispatch of power systems using an adaptive charged system search algorithm. Appl Soft Comput 73:607–622
AlSharman MK, Emran BJ, Jaradat MA, Najjaran H, AlHusari R, Zweiri Y (2018) Precision landing using an adaptive fuzzy multisensor data fusion architecture. Appl Soft Comput 69:149–164
Wang L, Pei J, Wen Y, Pi J, Fei M, Pardalos PM (2018) An improved adaptive human learning algorithm for engineering optimization. Appl Soft Comput 71:894–904
Mashwani WK, Salhi A, Yeniay O, Jan MA, Khanum RA (2017) Hybrid adaptive evolutionary algorithm based on decomposition. Appl Soft Comput 57:363–378
Cai Y, Sun G, Wang T, Tian H, Chen Y, Wang J (2017) Neighborhoodadaptive differential evolution for global numerical optimization. Appl Soft Comput 59:659–706
Aziz NAA, Ibrahim Z, Mubin M, Nawawi SW, Mohamad MS (2018) Improving particle swarm optimization via adaptive switching asynchronoussynchronous update. Appl Soft Comput 72:298–311
Wang G, Tan Y (2019) Improving metaheuristic algorithms with information feedback models. IEEE Trans Cybern 49(2):542–555
Li C, Yang S, Korejo I (2008) An adaptive mutation operator for particle swarm optimization. In: Proceedings of the 2008 UK workshop on computational intelligence, pp 165–170
Khurana M, Massey K (2015) Swarm algorithm with adaptive mutation for airfoil aerodynamic design. Swarm Evol Comput 20:1–13
Sk I, Minhazul DS, Saurav G, Subhrajit R, Suganthan Ponnuthurai N (2012) An adaptive differential evolution algorithm with novel mutation and crossover strategies for global numerical optimization. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):482–500
Luke S, Panait L (2002) Lexicographic parsimony pressure. In: Proceedings of genetic and evolutionary computation conference 2002. Morgan Kaufmann Publishers, pp 829–836
Silva S (2007) A genetic programming toolbox for MATLAB. University of Coimbra, Portugal. http://gplab.sourceforge.net/download.html
Silva S, Almeida J (2003) Gplab—a genetic programming toolbox for matlab. In: Proceedings of the Nordic MATLAB conference 2003, pp 273–278
Martino L, Elvira V, CampsValls G (2018) Distributed particle metropolishastings schemes. pp 553–557, May 2018
Martino L, Elvira V, CampsValls G (2018) Group importance sampling for particle filtering and MCMC. Digit Signal Process 82:133–151
Zhang J, Zhou Y, Luo Q (2018) An improved sine cosine water wave optimization algorithm for global optimization. J Intell Fuzzy Syst 34(4):2129–2141
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all the authors, Fuchang Liu states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hong, L., Woodward, J.R., Özcan, E. et al. Hyperheuristic approach: automatically designing adaptive mutation operators for evolutionary programming. Complex Intell. Syst. 7, 3135–3163 (2021). https://doi.org/10.1007/s40747021005076
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40747021005076
Keywords
 Hyperheuristic
 Evolutionary programming
 Genetic programming
 Adaptive mutation