1 Introduction

Despite having a number of advantages over classical gradient based techniques, the performance of evolutionary algorithms depends both on the problem to be optimised and the algorithm being used (Wolpert and Macready 1997). To make matters worse, this performance also depends heavily on the selection of algorithm specific control parameters. This variability of performance makes the field hard to penetrate for users in industry who simply want to use an algorithm to solve a problem. Often the problem they wish to solve is not well understood before they start to solve it, which makes selecting an algorithm and control parameters all the more difficult. The motivation of the work, presented in this paper, is to automate this selection using simple machine learning techniques. Specifically the aim is to automatically select an effective set of control parameters for differential evolution for an unknown problem.

1.1 Terminology

The problem to be optimised is termed the objective function. This paper focusses on optimising continuous black box objective functions. We identify a number of features, \(\varvec{\beta }\), that an objective function can be described by. An optimisation algorithm instance is determined by its control parameters \(\mathbf {p}\). Our aim is to classify objective functions using their features, in order to predict a set of effective control parameters which will result in a high performing algorithm for a particular objective function.

1.2 Background

When applying an evolutionary algorithm to a new application it is common to use the control parameters suggested in literature. These parameters are usually obtained from extensive studies on algorithm behaviour using suites of benchmark optimisation problems. Parameters which work well on common problem test suites will emerge (Eiben and Smit 2011) and this single set will end up being used in the majority of applications. The problem is that with truly novel applications there may be no understanding of which test suites, if any, correctly represents the real world problem. Strictly speaking, each time an algorithm is applied to a new application a parameter study should be undertaken, to both provide insight into the robustness of the parameters and perhaps squeeze out some additional performance. The reality is that these studies are often infeasible in real applications, where a single objective function evaluation may represent hours, or days, of computational time (Naumann et al. 2015; Walton et al. 2013a, b, 2015). Thus a great deal of research has been undertaken with the motivation to address this problem. We have identified three interrelated strands of research in the meta-heuristic optimisation community relevant to this problem. These are briefly discussed below, we then place our own approach in this context.

1.2.1 Automatic tuning algorithms based on performance modelling

A considerable body of work has shown that it is possible to build empirical performance models of algorithms (Hutter et al. 2014). These models can then be used to select tuning parameters with good predicted performance (Hutter et al. 2006). Sequential model-based optimization for general algorithm configuration (SMAC) (Hutter et al. 2011) and sequential parameter optimization (SPO) (Bartz-Beielstein et al. 2005) are both specific examples of this approach. In the case of SPO the approach facilitates manual tuning whereas SMAC is automated.

1.2.2 Feature based approaches

It is increasingly argued that we need to understand and use the characteristics and features of a problem to select a suitable algorithm, or tune it (Smith-Miles 2008). Feature based algorithm configuration (FBAC) (Belkhir et al. 2016) can be thought of as an extension of the automatic tuning algorithms mentioned above. It uses sophisticated objective function features to classify objective functions. They are able to accurately predict performance models for objective functions which could, in theory, be used to determine an effective set of control parameters. However, the features they use require a large number of samples of the objective function to calculate. This would lead to an excessive computational cost in real applications. Exploratory landscape analysis (ELA) (Mersmann et al. 2011) introduces ten features, which are relatively cheap to calculate and can be used to classify objective functions. These features are grouped into five classes which relate to different characteristics of objective functions. Promising results have been presented whereby the ELA features are used to train a one-sided support vector regression model to select an appropriate optimisation algorithm (Kerschke et al. 2016).

1.2.3 Adaptive algorithms

The most common strategy to address the problem of performance variability is to design algorithms with self-adaptive control parameters. In such algorithms the control parameters are themselves optimised, based on current performance, as the algorithm runs (Sarker et al. 2014; Zamuda and Brest 2015; Guo et al. 2014). A related field is hyper-heuristics whose goal is to automate the design of heuristic optimisation algorithms based on current performance (Burke et al. 2013; Li and Kendall 2015). These strategies are performed on a per-objective function basis and do not use knowledge of objective function features, or past performance on different objective functions.

1.2.4 Case study optimisation algorithm: Differential Evolution

To show the effectiveness of our approach we are forced to select a single optimisation algorithm. Differential Evolution (DE) (Storn and Price 1997) will be used to test the effectiveness of the predictive methodology. It is stressed that this approach is independent of the evolutionary algorithm, although some thought will be required if an algorithm has any non-continuous control parameters. DE is popular and its control parameters are well studied. The algorithm is aimed at nonlinear non-differentiable continuous functions and has been designed to be a direct stochastic search method. The method has a small number of control parameters and applies a crossover and mutation operator based on the differences between randomly selected individuals of the population.

There are a number of alternative DE methods and many additions have been made to the algorithm. It is beyond the scope of this paper to explain these additions in detail, so instead we describe the algorithm used in this study and allow the reader to find detailed explanation in original papers.

To select new members of the population, a direct one-to-one competition scheme is employed in each generation. From the population of the current generation, a target member, \({\mathbf {x}}_{i,g}\), is selected, where i refers to the member’s number and g the generation. A donor vector, \({\mathbf {v}}_{i,g}\), is generated using the current-to-pbest/1/bin approach (Zhang and Sanderson 2009). Three members of the population, distinct to that of the target member, are selected at random and \({\mathbf {v}}_{i,g}\) is calculated according to the relation

$$\begin{aligned} {\mathbf {v}}_{i,g} = {\mathbf {x}}_{i,g} + p_{2}({\mathbf {x}}_{pbest,g} - {\mathbf {x}}_{i,g}) + p_{2}({\mathbf {x}}_{r1,g} - {\mathbf {x}}_{r2,g}) \end{aligned}$$
(1)

where \(p_{2}\) is a control parameter usually referred to as the weighting factor. \({\mathbf {x}}_{r1,g}\) and \({\mathbf {x}}_{r2,g}\) are two members selected at random from the whole population and \({\mathbf {x}}_{pbest,g}\) is randomly selected from the top \(q \times p_{3} \, \,(q \in [0,1])\). \(p_{3}\) is the population size or number of parents. q is a control parameter which controls the greediness of the algorithm, to eliminate this parameter it is randomised as in the success-history based parameter adaptation for differential evolution (SHADE) algorithm (Tanabe and Fukunaga 2013). In addition, an external archive of previous members of the population is maintained and used to generate \({\mathbf {x}}_{r2,g}\) (Tanabe and Fukunaga 2013).

A cross over operator is applied to the target and donor vectors to form a trial vector. The elements of the target and donor vectors enter the trial vector with a probability \(p_{1}\), a control parameter usually referred to as crossover constant. The target vector is compared with the trial vector and the vector with the best fitness value is selected for admission into the next generation. This iteration scheme repeats until a suitable stopping criterion is met (Storn and Price 1997).

DE has been applied, with success, to the fields of electrical power systems, electromagnetic engineering, control systems and robotics, chemical engineering, pattern recognition, artificial neural networks and signal processing (Das and Suganthan 2011). Storn (2016) suggests using the control parameters \(p_{1}=0.900\), \(p_{2}=0.500\) and \(p_{3}=10D\) where D is the number of dimensions in the function. The effect of these parameters on algorithm performance is a well researched subject. For example, there appears to be complex relationships between problem dimensionality and the most appropriate population size (Piotrowski 2016).

We compare our proposed predictive technique to a state of the art adaptive technique: SHADE (Tanabe and Fukunaga 2013). This technique uses an historical memory of control parameters which have performed well to guide the selection of control parameters each generation. In the original study it was shown to have competitive performance compared to other state of the art algorithms using the CEC 2005 benchmarks which are used in this study. All control parameters used in our study are the same as used in the original SHADE study (Tanabe and Fukunaga 2013).

1.3 Contribution and motivation of this paper

The approach we have adopted is to select three simple to calculate features and use these to classify objective functions. Then as we optimise a series of objective functions a global memory of the performance, of various control parameters, for each of the classifications is stored. This information is then used to adapt control parameters for future optimisations. We do not create a performance model but directly use prior knowledge to adapt the optimisation algorithm. Thus our approach falls under the adaptive algorithm category and hence we compare our strategy to other adaptive strategies below. Our approach also falls under the feature based approach category since we are using objective function features to drive our adaptation. Our features are much simpler, and more crude, than those used in FBAC (Belkhir et al. 2016) and we use fewer than those identified in ELA (Mersmann et al. 2011). Our contribution is that even when using our deliberately simple approach there is a statistically significant improvement in performance when compared to algorithms which do not consider objective function features. The motivation for this is real world applications where it is infeasible to tune an algorithm each time a new objective function is considered, and where the form of the objective function may be unknown, making it difficult to relate to previous analyses of control parameters.

2 Methodology

2.1 Our approach: predicting effective control parameters for evolutionary algorithms using cluster analysis of objective function features

The aim of our approach is to automatically predict an effective set of control parameters for an unknown objective function. This is achieved by classifying objective functions using three simple to calculate features which are described in Sect. 2.1.2. A number of experiments are performed off-line with varying control parameters, across a range of objective functions. The algorithm performance is measured and recorded for each experiment, the performance metric used is described in Sect. 2.1.1. Functions are split into classifications using the unsupervised machine learning technique k-means++ (Arthur and Vassilvitskii 2007). All the experiments in a particular classification are ranked by performance and the mean values of the control parameters used in the top 10% of experiments is calculated. When a new function is to be optimised it is sampled, on-line, and its features calculated. It is then classified and the mean values calculated for its classification are used to optimise it.

2.1.1 Optimisation algorithm performance metric

There are a number of metrics which can be utilised to define the performance of an optimisation algorithm (Eiben and Smit 2011; Belkhir et al. 2016). The meaning of performance may change depending on the application (López-Ibáñez et al. 2016), but in general we wish to reduce the objective function value with a small number of objective function evaluations. In this work a performance metric, \(\alpha \), is defined as

$$\begin{aligned} \alpha = \frac{100(F_{1}-F_{G})}{F_{1}N_{g}} \end{aligned}$$
(2)

where \(F_{1}\) is the lowest objective function value in the first generation, \(F_{G}\) is the lowest objective function value in the final generation, G, and \(N_{g}\) is the total number of function evaluations performed up to and including generation g. Generation g is the first generation at which the reduction in the objective function reaches \(99\%\) of the total reduction i.e.

$$\begin{aligned} \frac{F_{G}}{F_{g}} > 0.99 \end{aligned}$$
(3)

This choice is justified as follows. In practice, an evolutionary optimisation algorithm is run until a maximum number of objective function evaluations is reached or a predetermined accuracy, or tolerance, is achieved. Dividing by \(N_{g}\) means that \(\alpha \) gives us information on the efficiency of the optimisation algorithm. An algorithm which finds the optimum in the first few generations therefore has a larger \(\alpha \) than an algorithm which found the optimum in the final generation. In real applications, practicalities such as objective function evaluation cost, limit the number of objective function evaluations (Naumann et al. 2015; Walton et al. 2013a, b, 2015). \(\alpha \) is designed to reward algorithms which exhibit high convergence in the first few function evaluations.

It is not claimed that \(\alpha \) is the correct metric for all situations, it is a choice depending on user requirements. In this study an attempt is made to model a situation where an engineer wishes to apply an optimisation algorithm to a real problem. One can imagine that such an engineer would simply select an algorithm and use the set of control parameters suggested in literature. In the authors experience, in applying optimisation algorithms to engineering applications, the proposed metric \(\alpha \) is relevant for many engineers. Control parameters suggested in literature may not have been tuned with this metric in mind, despite this the engineer would likely use these parameters. In the results section convergence curves are presented to show the effect of this metric choice.

2.1.2 Objective function features

Functions are often described using features such as symmetry, smoothness, condition number or separability. It is well understood that these features affect the performance of optimisation algorithms. The challenge, therefore, is to formulate a set of features that can be calculated with a small number of objective function evaluations.

In this proof of concept study, the starting point for calculating these features will be a Latin hypercube sampling of the objective function search space. The number of samples taken is referred to as \(\sigma \). This sampling will be performed prior to the optimisation in this study, but in future this sampling could also be used as the first generation of the optimisation algorithm. The objective function values in this sampling are first normalised by subtracting the mean and dividing by the standard deviation. Three simple features have been selected to test the methodology.

  1. 1.

    \(\beta _{1}\), is the number of dimensions of the function which is known to strongly effect algorithm performance.

  2. 2.

    \(\beta _{2}\), is the interquartile range of the normalised data, which provides information on function variation within the domain. This feature will identify functions which have largely flat topology. This is identified as a feature which relates to curvature in Mersmann et al. (2011).

  3. 3.

    \(\beta _{3}\), is the skew of the normalised data. The skew tells us how the function value is distributed, a skew of zero would indicate a normal distribution, whereas positive and negative values would indicate a tailed distribution. This feature could potentially identify functions with sharp optimum as well as give information regarding function symmetry. This is identified as a feature which relates to y-distribution in Mersmann et al. (2011).

Collectively these features make up the characteristics, \(\varvec{\beta }\), of a particular objective function.

2.1.3 Control parameter selection

DE requires a number of control parameters, stored in the vector \(\mathbf {p}\), which defines a single instance of the algorithm. Running many \(\mathbf {p}\) on many objective functions results in a number of data points in the form \((\mathbf {p},\varvec{\beta },\alpha )\). This data is named the training data and is used to exploit any relationships between the control parameters, function features and performance.

The approach adopted is to apply the unsupervised clustering algorithm k-means++ (Arthur and Vassilvitskii 2007). The k-means algorithm takes an unlabelled data set and classifies it into a user specified number of groups, \(\kappa \). Each group is defined by a cluster centroid, a data point belongs to the group whose centroid it lies closest to. The k-means++ variant of the algorithm carefully initialises these centroids in favour of random initialisation (Arthur and Vassilvitskii 2007).

Using the training set, the objective functions are classified by applying k-means++ to the \(\varvec{\beta }\) data points. For each classification the data points are sorted by \(\alpha \). The top \(10\%\) data points are identified and the mean \(\mathbf {p}\) is calculated from that set and used to optimise new functions identified as belonging to that classification.

At the end of optimisation the new data point, \((\mathbf {p},\varvec{\beta },\alpha )\), from that run is appended to the training data and the k-means++ algorithm is run to update the classifications and redetermine the best control parameters for each new centroid. The key idea is that the memory of good performing parameters are extended from a single optimisation run to the entire history of using the algorithm.

2.2 Experimental methodology

2.2.1 Procedure for a single optimisation

Each time a function was optimised the optimiser was limited to 10,000 objective function evaluations. To consider our approach fairly the on-line cost of calculating \(\varvec{\beta }\), before the optimisation takes place, contributes to the number of objective function evaluations. In other words our approach has fewer objective function evaluations available when the optimisation starts. All optimisation runs were repeated 30 times, with 30 different random seeds used for all random number generation. With the same random seed the same control parameters would result in the same performance on the same function instance. This allowed pairwise comparisons between different control parameters. Repeating each optimisation run with 30 different random seeds ensured a ‘lucky’ seed was not selected which benefited a particular approach.

2.2.2 Test suites

Two established optimisation benchmark suites were used in this study.

Real-parameter black-box optimization benchmarking functions (BBOB) 2015 BBOB 2015 were used to train the predictive methodology. The 24 benchmark functions which make up the BBOB 2015 test suite are given in Finck et al. (2010) and Hansen et al. (2010). This suite includes separable functions, functions with low to high condition numbers and multi-modal functions with weak global structure. The same numbering system for the functions in Hansen et al. (2010) is used in this paper. All of these functions are defined for an arbitrary number of dimensions and have the same search domain. The test suite includes 15 instances for each function, for each instance a combination of optimal location shifting and linear transformations are applied. Each instance is shifted and rotated in the same manor on subsequent runs which enables direct comparison of performance. In the experiments presented here, a single test suite entails optimising each function instance at a range of dimensions (2, 10, 20, 30, 40, 50) using 30 different random seeds. The resulting number of tests in a single run of the suite is then 64,800.

IEEE congress on evolutionary computation CEC 2005 real-parameter optimisation benchmarks The CEC 2005 benchmark functions as detailed in Suganthan et al. (2005) make up the second test suite. These were used to test the effectiveness of problem specific tuning on objective functions different to the training set. The 25 functions were used with the same numbering system presented in the technical report (Suganthan et al. 2005). All functions were optimised at 2, 10, 30 and 50 dimensions using 30 different random seeds resulting in a total of 3000 tests.

2.2.3 Statistical methodology

Using the test suites described above allows pairwise comparison of \(\alpha \) between different approaches. The approach for using nonparametric statistical tests described by Derrac et al. (2011) is followed here. The Wilcoxon signed ranks test is used to compare the predictive methodology to other approaches. The test results in the value W, the sum of the ranks of the differences (zero-differences are split between positive and negative) which will be reported along with the two sided p value. For all the statistics presented a positive W indicates that the predictive methodology has performed better than the group it is compared to, a larger W indicates a more significant improvement.

2.2.4 Experiments

For each function in the BBOB 2015 suite a Latin hypercube sampling of \(\mathbf {p}\) will be generated and each of these control parameter sets used to optimise that function. Since there are three control parameters, 30 sets of \(\mathbf {p}\) will be generated each time. For these optimisation runs the number of samples used to calculate \(\varvec{\beta }\), was set to \(\sigma =1000\). The resulting data will be used as the initial training set for the problem aware tuning.

There will then be four methods for selecting the DE control parameters:

  • The suggested parameters from literature,

  • SHADE,

  • The predictive methodology (using cluster analysis),

  • Using the best performing control parameters from the training set.

The predictive methodology will be applied with varying \(\sigma \) and \(\kappa \) to gauge the sensitivity to these. Each time our approach is used, a new set of samples is generated to calculate \(\varvec{\beta }\) in order to simulate the use of the approach in practice. Each of these methods will be used to optimise both function suites. The non-parametric tests will then be utilised to compare the effectiveness of each method.

It needs to be stressed that, in the comparisons presented, \(\varvec{\beta }\) is recalculated and used for objective function classification in each optimisation run. This does mean that when comparing the number of function evaluations to other methods, the predictive methodology has \(\sigma \) additional evaluations. These function evaluations have been included in all measurements of performance as they indicate the cost of our methodology.

The goal of this paper is to show that features, such as those in \(\varvec{\beta }\), can be used as predictors for \(\mathbf {p}\) in order to maximise \(\alpha \). If this is the case, future research into minimising the required \(\sigma \) to effectively approximate \(\varvec{\beta }\) can be undertaken, as well as research into different definitions for \(\alpha \).

3 Results

3.1 BBOB 2015 function suite

Figure 1 shows a two dimensional projection of the training data used in the following studies. This data resulted from optimising the BBOB 2015 function suite only. The marker colour indicates which classification each data point belongs to when \(\kappa =10\).

Fig. 1
figure 1

The BBOB 2015 training data set. Markers are coloured according to their classification when \(\kappa =10\) (Color figure online)

In the following experiments the BBOB 2015 suite is both the training suite and the testing suite. This is the most basic test for our approach. It is worth pointing out that \(\varvec{\beta }\) is recalculated using new random samplings for each optimisation experiment.

3.1.1 Predictive methodology compared to picking the best from the training set

Table 1 The control parameters used in the fixed control parameter optimisations

In the following study a single set of tuning parameters, the best performing in the initial training set, were used to optimise the BBOB 2015 test suite. The control parameters used in this study are presented in Table 1. The results of the statistical tests, shown in Table 2, show that predictive methodology results in a statistically significant increase of \(\alpha \) compared to the best from the training set. This improvement was achieved regardless of the values of \(\kappa \) and \(\sigma \).

Table 2 BBOB 2015 function suite: Wilcoxon signed ranks test data comparing the predictive methodology to using the best performing control parameters overall in the training set (equivalent to \(\kappa =1\))

3.1.2 Predictive methodology compared to using the best parameters from literature

Table 3 BBOB 2015 function suite: Wilcoxon signed ranks test data comparing the predictive methodology to using the control parameters most commonly used in literature

The test suite was optimised using the control parameters suggested in literature, these parameters are shown in Table 1. These parameters are what most practitioners would use in practice as a rule of thumb. Table 3 shows the statistical comparison between the performance of these fixed parameters to the predictive methodology. In all cases the predictive methodology performs significantly better. There is a jump in performance when \(\sigma \) increases from 100 to 1000 which indicates a sensitivity on the sampling of the objective functions.

3.1.3 Predictive methodology compared to SHADE

Table 4 BBOB 2015 function suite: Wilcoxon signed ranks test data comparing the predictive methodology to SHADE

The test suite was optimised using SHADE. Table 4 shows the statistical comparison between the performance of SHADE to the predictive methodology. In all cases the predictive methodology performs significantly better than adaptive tuning. The performance is more significant when \(\sigma =1000\).

Fig. 2
figure 2

Examples of the convergence behaviour using different control parameter selection strategies for the BBOB 2015 function suite. In the predictive methodology \(\kappa =10\) and \(\sigma =1000\)

3.1.4 Convergence behaviour

In Fig. 2 convergence plots are presented for a number of functions in the BBOB 2015 suite. The objective function value is plotted against the number of function evaluations for each control parameter selection strategy. These data points were only recorded once per generation if an improvement in the objective function value was found. This means that the number of data points depends on the population size and is not the same for every curve. Each function is shown for different numbers of dimensions and the control parameters selected by the predictive methodology are presented. These functions were selected to show a range of cases, some where the predictive methodology performs well and some where it does not. Where the predictive methodology performs well it achieves rapid convergence early in the optimisation, which is what the metric \(\alpha \) was designed to achieve.

3.2 CEC 2005 function suite (predictive methodology trained using BBOB 2015)

3.2.1 Problem aware tuning compared to picking the best from the training set

The predictive methodology and the best control parameters from the initial training set were used to optimise the CEC 2005 benchmark functions. Table 5 shows the statistical tests for this comparison. For all \(\kappa \) and \(\sigma \) the predictive methodology performs significantly better with p values \(<0.0001\). The improvement observed when optimising the CEC 2005 function suite is comparatively less significant than the improvement observed when optimising the BBOB 2015 suite.

Table 5 CEC 2005 function suite: Wilcoxon signed ranks test data comparing the predictive methodology to using the best performing control parameters overall in the training set (equivalent to \(\kappa =1\))
Table 6 CEC 2005 function suite: Wilcoxon signed ranks test data comparing the predictive methodology to using the control parameters most commonly used in literature

3.2.2 Predictive methodology compared to using the best parameters from literature

The results of statistical tests comparing the predictive methodology to the fixed parameters suggested in literature are shown in Table 6. For all cases the predictive methodology outperformed the fixed parameters, but the increase in performance is not statistically significant when \(\kappa =100\).

Table 7 CEC 2005 function suite: Wilcoxon signed ranks test data comparing the predictive methodology to using SHADE

3.2.3 The predictive methodology compared to SHADE

The results of statistical tests comparing the predictive methodology to the fixed parameters suggested in literature are shown in Table 7. For all \(\kappa \) and \(\sigma \) the predictive methodology significantly outperformed SHADE with p values all \(<0.0001\).

3.2.4 Convergence behaviour

Figure 3 compares convergence plots for functions from the CEC 2005 suite using the different control parameter selection strategies. Functions were selected to present a range of behaviours.

Fig. 3
figure 3

Examples of the convergence behaviour using different control parameter selection strategies for the CEC 2005 function suite. The predictive methodology was trained using the BBOB 2015 function suite with \(\kappa =10\) and \(\sigma =1000\)

4 Discussion

The results show that, when optimising the BBOB 2015 objective function suite, the predictive methodology out performed using fixed and adaptive tuning parameters for DE with p values\(<0.0001\). The predictive methodology is more likely to outperform other methodologies when the initial sampling size, \(\sigma \), of the objective functions was increased. There was a slight drop in performance from \(\kappa =10\) to \(\kappa =100\). This indicates that having fewer large classifications of objective function are better than a more granulated approach. This trend does not continue to \(\kappa =1\), i.e. simply using the best parameters from the training set. The observed improvement from \(\kappa =1\) to \(\kappa =10\) and \(\sigma =100\) to \(\sigma =1000\) both support the claim that \(\varvec{\beta }\) can be used as a predictor for which control parameters to use in an optimisation algorithm.

Optimising the CEC 2005 benchmark functions, using only the BBOB 2015 suite for training, is a tough test for the suitability of \(\varvec{\beta }\) to act as a predictor for control parameters in the general case. In all cases the predictive methodology outperformed other approaches. The closest performing approach to the predictive methodology was simply using fixed control parameters suggested in literature, in this case two tests (\(\kappa =10\), \(\sigma =1000\) and \(\kappa =100\), \(\sigma =1000\)) have p values 0.075 and 0.528 respectively. This could be explained by the fixed control parameters not requiring objective function evaluations to adapt, i.e. all objective function evaluations are used for optimisation. The high value of \(\sigma \) for these outliers further supports this explanation, when \(\sigma <100\) the p values are \(<0.01\) when comparing to fixed control parameters from literature. This effect is then compounded when \(\kappa =100\) which, as observed elsewhere, performs less well than \(\kappa =10\).

Overall the results show that the simple objective function features, \(\varvec{\beta }\) can act as a predictor for selecting appropriate control parameters for DE. The advantage of this approached can be observed when comparing it to SHADE. SHADE learns which control parameters are most effective during the optimisation process. The predictive methodology attempts to predict effective control parameters prior to the optimisation, therefore the benefit is felt from the first iteration. This prediction itself comes at the cost of objective function evaluations, which were accounted for in this study. The open problem, therefore, is how to effectively approximate these features with fewer objective function evaluations. In the future, an aim is to use these objective function evaluations as the first generation of the optimisation run in order not to waste them.

To improve performance, it may be possible to update the value of \(\varvec{\beta }\) as the optimisation runs and better samples the objective function. As the approximation of \(\varvec{\beta }\) improves the control parameters could be changed mid run. This would require thoughtful implementation to avoid introducing significant computational overhead. There is also scope for designing more sophisticated and varied objective function features. Performance may also be increased with a larger training data set. This does come with an additional overhead, as the computational cost of the k-means algorithm increases with the size of the training data. In the future there is no reason why the k-means calculation could not be performed using cloud computing with training data collected from many users of the algorithm.

4.1 Conclusions

The methodology proposed has been shown to offer statistically significant improvement over other approaches. This implementation shows that the concept has the potential to be a powerful addition to evolutionary optimisation algorithms. The method is general and could be applied to any evolutionary algorithm and any performance measure of interest. There are a number of avenues to investigate, discussed above, which may improve the methodology further. In particular an investigation into more sophisticated function features, such as those presented in ELA (Mersmann et al. 2011). The long term goal should be to extend this methodology to automatically select the most appropriate evolutionary algorithm for a problem, not just the control parameters. Such an automation would be of great use for industrialists wishing to apply evolutionary algorithms to real world applications.