1 Introduction

Optimisation algorithms are developed with search techniques to find the best-fit solution within a search space. Neighbourhood functions, also known as operators, are the mathematical functions (functional instruments) used in search algorithms to handle moves within a local neighbourhood of search space. As part of the process of population-based problem solving algorithms, where substantial challenges remain, the operators assist in producing new solutions that are evaluated for promotion and replacement with the existing solutions. Various metaheuristic approaches instrumentalise different approaches to promote the produced solutions  (Sotoudeh-Anvari and Hafezalkotob 2018). This is to maintain the exploration and exploitation rate as balanced as possible in order to keep the search process diverse and productive. The diversity of the population of solutions and the productivity of the operators with respect to the efficiency of search is influenced by the structure of the neighbourhood, which can be characterised with various measures. Many studies, including (Fragata et al. 2019), emphasise the characteristics of the search space and fitness landscape, where more information could be extracted to improve promotion rules and increase success rates.

To preserve diversity and productivity, population-based algorithms designed with multiple operators need to incorporate systematic and efficient selection strategies, which are investigated in various contexts. Adaptive operator selection appears to be a useful avenue to maintain such diversity and richness in the search process in order to avoid potential local optima (Fialho 2010). Although it can also be implemented for individual-solution-driven algorithms, this approach is usually applied to population-based metaheuristics, i.e., evolutionary algorithms (Sun et al. 2020) and swarm intelligence algorithms (Durgut and Aydin 2021). The compelling challenge always enforces to pay more attention in the way how to build the adaptive selection scheme and which kind of information to be used in opting the most suitable operators. Few credit-based approaches have been proposed in the literature (Durgut and Aydin 2021; Wang et al. 2014; Fialho 2010) with reported limitations. The adaptive selection can be achieved enhancing the performance with mapping the problem states to the operators through machine learning, where the representation of the problems plays a crucial role. Binary problem representation has been studied by researchers to accomplish general and transferable approaches at the expense of scalability (He et al. 2018; Santana et al. 2019).

Fitness landscape studies have been attractive for a long time with which more auxiliary information can be extracted and used to identify the search circumstances and characterise the search space. More details can be found in one of the latest reviews (Fragata et al. 2019). Such auxiliary information can be utilised to harvest for representative and discriminating set of features to characterise the search circumstances, while, previously, the problem state based on binary representation was used to help characterise the search circumstances  (Durgut and Aydin 2021; Durgut et al. 2022). However, due to the strong dependency on the problem size, the binary representation approach was not scalable for different sizes of problem instances.

The aim of this study is to open a pathway for a scalable adaptive operator selection approach through supervised machine learning techniques. A bespoke set of predictive features are expected to characterise the search space and fitness landscape in order to make the most effective choice when selecting the appropriate actions, such as activating the best-fitting/productive neighbourhood function. It is anticipated that predictive analysis would enable us to explore the causal effects underlying how neighbourhood functions behave in generating neighbouring solutions. This may lead to better representation with respect to the quality of the solution as well as the scalability. The details of the predictive analysis can be found in Nyce (2007). The initial results of this study, which focused on feature analysis, were previously published in Durgut et al. (2022). This article reports the extension of the study with respect to feature analysis and building an adaptive operator selection (AOS) scheme with supervised machine learning approaches. The complete experimental results of the proposed AOS approach integrated into the Artificial Bee Colonies (ABC), the swarm intelligence algorithm, for solving OneMax and set union knapsack (SUKP) problems, which are known as two popular combinatorial optimisation problems. The significance of the proposed approach is to facilitate the transfer of acquired experience to other scenarios, including different problem types, sizes, and new problem instances. The novelty is that we propose a more adaptive approach which resolves the issues of other traditional operator selection approaches while accomplishing scalability.

The rest of this paper is organised as follows; Sect. 2 provides the background and related work, while Sect. 3 presents the framework of the proposed approach in detail including the fitness landscape information items selected for use in characterising the search circumstances and problem states this study. Section 4 presents experiments, statistics, and other relevant analyses. Section 5 concludes the paper outlining the future work.

2 Related work

Data-driven and bottom-up approaches – using data analysis – in characterising unknown problems have been eased and facilitated with the introduction of big data, which escalated to dealing with a huge number of data instances and features. The search spaces in optimisation domain are known as unpredictable and dynamic processes, where the search space size increases exponentially as the number of dimensions grows. Attempts to characterise such search spaces faces increasing the computational complexity of most learning algorithms - for which the number of input features and sample size are critical parameters. To reduce space and computational complexities, the number of features of a given problem should be reduced (Durgut et al. 2020). Many predictors benefit from the feature selection process since it reduces overfitting and improves accuracy, among other things (Chandrashekar and Sahin 2014). In the literature (Wang et al. 2017; Macias-Escobar et al. 2019), fitness landscape analysis has been shown to be an effective technique for analysing the hardness of an optimisation problem by extracting its features. Here, we review some existing approaches that are most closely related to the work proposed in this paper.

In Wang et al. (2017), the notion of population evolvability is introduced as an extension of dynamic fitness landscape analysis. The authors assumes a population-based algorithm for sampling, two metrics are then defined for a population of solutions and a set of neighbours from one iteration of the algorithm. Due to the exploration process that occurs during each generation, population evolvability can be a very expensive operation. To avoid a computationally intensive operation, the work suggests that the number of sampled generations must be carefully defined. In Macias-Escobar et al. (2019), a very similar approach has been proposed to apply population evolvability in a hyperheuristic, named Dynamic Population-Evolvability based Multi-objective Hyperheuristic. In Tan et al. (2021), the authors proposed a differential evolution (DE) with an adaptive mutation operator based on fitness landscape, where a random forest based on fitness landscape is implemented for an adaptive mutation operator that selects DE’s mutation strategy online. Similarly, in both (Sallam et al. 2017) and Sallam et al. (2020), DE embedded with an adaptive operator selection (AOS) mechanism based on landscape analysis for continues functional optimisation problems.

A survey by Karimi-Mamaghan et al. (2022) presented the integration of ML and metaheuristics to tackle combinatorial optimisation problems. Their technical review is focused in the design of various Meta-heuristic elements for different purposes, such as algorithm selection, fitness assessment, initialisation, evolution, parameter setting, and cooperation. Another survey by Malan (2021) summarises recent advances in landscape analysis, including a variety of novel landscape analysis approaches and studies on sampling and measure robustness. It drives attention on landscape analysis applications for complex problems and explaining algorithm behaviour, as well as algorithm performance prediction and automated algorithm configuration and selection. In Teng et al. (2016), the authors propose a continuous state Markov Decision Process (MDP) model to select crossover operators based on the states during evolutionary search. For AOS, they propose employing a self-organising neural network. Unlike the Reinforcement Learning technique, which models AOS as a discrete state MDP, their neural network approach is better suited to models of AOS that have continuous states and discrete actions. However, usually MDP based model computationally expensive due to the state space explosion problem. In Reijnen et al. (2023), for the Adaptive Large Neighbourhood Search heuristic, the authors proposed DR-ALNS, a Deep Reinforcement Learning-based operator-selection technique. It has been used to solve the Time-Dependent Orienteering problem with Stochastic Weights and Time Windows, and they found some comparable results against existing classical Adaptive Large Neighbourhood Search approaches. In Pei et al. (2023), to investigate and quantify the relationship of search operators, the local optima correlation (LOC), a neighbourhood relationship measurement, is developed. On a range of benchmark instances for the capacitated vehicle routing problem, empirical analysis of LOC is conducted. Results demonstrate that a set of commonly used search operators have a consistent relationship. An operator selection framework called AOS-assisted LOC is then put forth with the aim of predicting the local optima of each operator based on data from the early stages of optimisation. To improve the effectiveness of the adaptive large neighbourhood search (ALNS) metaheuristic, the authors of Johnn et al. (2023) have presented an operator selection mechanism based on Deep Reinforcement Learning. They have proposed an operator selection that is dependent on the decision space properties of the current solution. They have shown that it performs better than the traditional Roulette Wheel and random operator selection, and it is possible to scale the model to handle large problem instances by utilising Graph Neural Networks.

Most of these studies have considered population-based landscape metrics to characterise the situation, while some have considered individual-based measures. In addition, the state-of-the-art approaches in literature implemented to solve functional optimisation problems, which are significantly different from combinatorial problems with respect to predictability and characterisation of fitness landscape. Moreover, the approaches used to represent the problem states and search space are either not sufficiently representative or not scalable. In this study, we attempt to use both population and individual-based metrics side-by-side for characterisation of the problem state – for representation purposes – evaluating the impact of each upon the prediction results for this purpose. We follow up with proposing an adaptive operator selection scheme built up with supervised machine learning approaches to solve combinatorial optimisation problems, where two combinatorial problems (binary in this case), known as NP-Hard problems.

3 Materials and methods

3.1 Operator selection for swarm optimisation

A family of optimisation algorithms referred to as swarm intelligence works with operators to generate new solutions and drive the swarm intelligence algorithms to search systematically for the optimal solution. However, operators do not always push to operate on suitable circumstances unless studied appropriately. This necessitates the operator selection process to be handled efficiently in order to guarantee a successful and efficient optimisation process.

Operator selection remains one of the problems that researchers attempt to solve in order to increase the effectiveness and performance of optimisation algorithms. It has an effect on the population/swarm diversity, which aids in searching through a search space with an acceptable level of richness. To get the desired efficiency level, the operators are applied in either a random or systematic order. Through the use of a selection scheme, the systematic order can be managed. For instance, genetic algorithms handle the change of operators in accordance with probabilistic rules, whereas variable neighbourhood search imposes a periodic change of operators.

An operator selection scheme assists to select the most appropriate operator subject to given search circumstances. The proposed approach is sketched in Fig. 1 in which a four-stage process is depicted. The first stage, "Random Search", is the phase of generating data through running the algorithm, the bespoke implementation of Artificial Bee Colony (ABC) (Karaboga et al. 2014), for a number of benchmark instances of the problem under consideration, e.g., OneMax or set union knapsack problem (SUKP). The second stage in the complete process is "Feature Selection" in which the most impactful set of features are selected. Then, "Training" is conducted for the best bespoke "Optimisation" algorithm.

Fig. 1
figure 1

The complete process for building data-driven operator selection scheme

The training stage follows data generation and pre-processing, including feature selection. This is introduced in Fig. 2, where the necessary data is collated through the state information stored in "Data", and the pool of operators indicated as "Operators". The data is labelled with merging the selected operator for each individual state and then fed into the "Training" component, which runs supervised learning algorithms such as Random Forest (RF), Multilayer Perceptron (MLP), and Support Vector Machine (SVM). "Training" process repeats with the same set of labelled data in various bespoke forms until the learning metrics are satisfied. The output of the system, then, turns to be the trained model, named as "Adaptive Operator Selection Scheme" holding knowledge of the mapped states with operators, accordingly.

Fig. 2
figure 2

Training stage of operator selection scheme

Once the mapping between operators and the states has been satisfactorily achieved, the adaptive operator selections scheme would be readily built and can be used by the swarm optimisation algorithm, ABC henceforth, with which the operators will be pulled up from the pool of operators guided by the adaptive selection scheme. Figure 3 presents the logic how to utilise the trained machine learning model, which is inserted in-between the algorithm’s evaluation module and the pool of operators. It is worth mentioning that the actual ABC algorithm employs bees to search through problem states; it selects one operator from the pool in accordance with the problem state enforced by the adaptive selection scheme, which predicts the most suitable operator for generating the next position of the bees.

Fig. 3
figure 3

Adaptive operator selection process trained with supervised machine learning algorithms interacting with ABC algorithm in each bee generation

The ABC algorithm implemented here is inspired by Durgut and Aydin (2021) and Durgut et al. (2022), which works with binary representation and a set of binary operators. The implemented ABC is sketched in Algorithm 1 presented in the Appendix. Lines 5 and 18 of the algorithm are the steps where the Adaptive Operator Selection Scheme takes action to select the best fitting operator to the problem state in hand. Four state-of-art binary operators have been embedded in the pool of operators as part of the ABC implementation used in this study: flipABC() inverts any randomly selected bit, while novel binary ABC (nbABC) (Santana et al. 2019) crossovers n bits from a randomly chosen neighbouring solution. Likewise nABC (Xiang et al. 2021) crossovers a number of bits up to n in a randomised way. Finally, improved binary ABC (ibinABC) (Durgut and Aydin 2021) applies XOR operator – normalised with the number of iterations – to two chosen solutions.

The complexity of the ABC algorithm has been studied in Wang et al. (2022), where the complexity of the algorithms is demonstrated to be \({\mathcal {O}}(\phi *|P|)\) working with a variety of ABC versions. The complexity does not change in this implementation, either, since only two simple operations are added. Here, |P| is the size the swarm (colony), while the problem’s self complexity is represented with \(\phi\). Neither the operator selection function nor any of the operators require any variable looping as part of the$$\{\mathop a\limits^{{..}} ir process, hence, the complexity remains the same.

3.2 Feature analysis and landscape features

Feature analysis and selection is the second stage of the process sketched in Fig. 1 with which the features identified to represent the problem state are reviewed and analysed with respect to their emphasis. The first, third, and fourth stages in the figure have been introduced in the previous section. Numerous previous research on the adaptive operator selection process took into account problem representation with a set of binary features, which were subsequently found to be non-scalable. The approach outlined below is scalable and makes use of the findings from earlier research.

Fitness landscape analysis provides representative information, which can be used in the characterisation of the search space and the position of the state of the problem at hand. The most representative information can be established from a large body of literature that has been developed over the past few decades. For example, the relevant literature can be found in Fragata et al. (2019); Ochoa and Malan (2019); Pitzer and Affenzeller (2012).

Diversity is one of the very important aspects of swarms to help characterise states (Erwin and Engelbrecht 2020), while Wang et al. (2017) discuss the evolvability of populations with dynamic landscape structure. Let \({\mathcal {A}}=\{{\textbf{a}}_i|i=1\dots |{\mathcal {A}}|\) be the set of attributes, features, with which the problem states are characterised, where the set consists of the attributes for individual solutions \({\dot{\mathcal{A}}}\) and the attributes of population of solutions, \(\ddot{A}\). This makes up a union; \(A = \dot{A} \cup \ddot{A}\).

A number of features can be retrieved from state of art the literature as listed in Tables 1 and 2. The population-based metrics – considered as attributes, henceforth feature– are listed in Table 1 with corresponding calculation details. The first 5 metrics, \(\{ \mathop {{\text{ }}a}\limits_{i}^{{..}} |i = 1 \ldots 5,\mathop {{\text{ }}a}\limits_{i}^{{..}} \in A..\}\), have been collected from Teng et al. (2016) and implemented for (i.e. adjusted to) artificial bee colony algorithm (ABC), which is one of the very recently developed highly reputed swarm intelligence algorithms (Karaboga et al. 2014). In order to adjust them to binary problem solving, the metrics calculated based on distance measure have been binarised using Hamming distance as in Erwin and Engelbrecht (2020). The metrics, \(\{ \mathop {{\text{ }}a}\limits_{i}^{{..}} |i = 6 \ldots 9,\mathop {{\text{ }}a}\limits_{i}^{{..}} \in \ddot{A}\}\), are introduced and proposed in Wang et al. (2017) with sound demonstration, while \(\ddot{{\textbf{a}}}_{10}\) is obtained from the trail index used in ABC and utilised to measure/observe the iteration-wise hardness in problem solving. In addition, \(\ddot{{\textbf{a}}}_{11}\) is taken from Anescu and Ulmeanu (2017) to calculate the distance between two farthest individuals within a population/swarm.

The literature includes more metrics calculated through local search procedures. However, these kind of features, i.e. metrics, have been left out due to the scope of the study. In fact, it is known that access to preliminary information on search is not easy, hence, we encompass the change in instant search in formation online decision making.

The base notation of population-based features is as follows. Let \(P=\{p_i|i=0,1,...,N\}\) be the set of parent solutions and \(C=\{c_i|i=0,1,...,N\}\) be the set of children solutions reproduced from P, where each solution has D dimensions. Also, let \(F^p=\{f^p_i|i=0,1,...,N\}\) be the set of parent fitness values and \(F^c=\{f^c_i|i=0,1,...,N\}\) be set of children fitness values. \(g_{best}\) represents the best solution found so far and \(p_{best}\) represents the best solution in the current population.

Table 1 Population-based features
Table 2 Individual solution-based features

On the other hand, a number of metrics, features, can be obtained from the auxiliary information of individual solution, which seem to serve efficiently in individual-specific aspects with which the operators can act upon significantly on case basis. Individual-related characteristics are tabulated in Table 2, which are mostly proposed by Teng et al. (2016) except \(\dot{{\textbf{a}}}_7\). \(\dot{{\textbf{a}}}_7\) has been introduced in this study for the first time. The last feature is the success rate per operator, i is calculated with \(\dot{{\textbf{a}}}_{8,j} = \frac{{sc}_j }{{tc}_j}\), where sr is the success counter and tc is the total usage counter.

4 Experimental results

The experimental results have been collected over multiple runs of an Artificial Bee Colony algorithm developed for earlier investigations embedded with a pool of operators selected at random each time a new solution is generated. Every successful move made during algorithm execution has been identified as a successful case and labelled accordingly.

The two well-known combinatorial optimisation problems have been considered as a test-bed; One-Max (Goëffon and Lardeux 2011) as a unimodal and Set Union Knapsack (SUKP) (Lin et al. 2019) as a multi-model problem. The size of benchmark problems taken under consideration for One-Max and SUKP are 1000 and 500, respectively, while the maximum number of iterations are 150 and 500, respectively.

The preliminary experimentation demonstrated that the level of hardness and complexity very much depends on the progress of the search process. Hence, the entire search period is divided into three phases as it is expected that the behaviour of the operators would vary significantly over the time and stage of iterations. The subsequent subsection provides relevant analysis.

4.1 Data generation and labelling

The first stage depicted in Fig. 1 is about running the implemented ABC algorithm for data generation purposes. The data is generated by a random operator selection scheme embedded into the proposed artificial bee colony optimisation algorithm for both problem types under consideration, OneMax and SUKP, and will be used for feature analysis, training, and testing purposes in the future. The implemented ABC algorithm has been devised with a pool of operators, as mentioned above, and a random selection scheme to select an operator each time needed for generating a new solution. Then the metrics mentioned above, \(A = \dot{A} \cup \ddot{A}\), have been calculated to set up the values for each feature to represent the solution. Given the circumstances, the child solution, \(c_{k,l}\) will be represented with \(c_{{k,l}} = \{ a_{{k,l,i}} |i = 1 \ldots 19,a_{{k,l,i}} \in {\text{ }}A\} \Leftarrow \{ \mathop {{\text{ }}a}\limits_{i}^{{..}} |i = 1 \ldots 11,\mathop {{\text{ }}a}\limits_{i}^{{..}} \in {\text{ }}\ddot{A}\;\bigcup \; \mathop {{\text{ }}a}\limits_{i}^{.} |i = 1 \ldots 8,\mathop {{\text{ }}a}\limits_{i}^{.} \in {\text{ }}\dot{A}\}\), where k is the iteration index, while \(l=1 \dots N\) is the index of the solution within the population, i.e., the swarm. The solutions are paired with the operators selected, \(o_{i} = \mathop {\arg \min }\limits_{{o_{i} \in O}} \left\| {o_{i} - r} \right\|\), where \({\textbf{r}}\) is a randomly generated value. The new data point generated will be \(\bigl \langle c_{k,l},o_i \bigr \rangle\), where \(c_{k,l}\) represents the solution state and \(o_i\) stands for the selected operator. The complete data set is generated accordingly and formatted in this way in order to label the data for later use.

The difficulty through search process varies. Due to this fact, the behaviour of operators changes throughout the process. In order to achieve better learning, we decided to split the search timeline into three phases, and treat each of these phases separately and independently. It is well recognised that improving performance is simpler to accomplish in the early stages of the search process than in the middle stages, but improvement in performance is still feasibly attainable. However, a positive gain in performance turns to be very difficult in later, i.e., in the final stage of the process. As a result, we have treated the complete search timeline as a three-phase process, where the behaviours of operators can alter. The data has been generated to cover all three phases equally, where the compete data set, \(\textrm{D}\), is defined to be \(\textrm{D} = \textrm{D}_1 + \textrm{D}_2 +\textrm{D}_3\), where \(\textrm{D}_1\), \(\textrm{D}_2\), and \(\textrm{D}_3\) are the subsets of data, \(\textrm{D}\) representing the three phases of the search process, respectively. This factor has been considered in order to design training and test sets that are efficient and fair.

4.2 Feature exploratory analysis

The second stage of the framework described in Fig. 1 is to analyse the features for efficient algorithmic design. A set of exploratory analyses are conducted to explore both the relevance of input features as well as their relative importance to the task of operator selection. The latter is discussed further in Sect. 4.3.1. The tests are analysed and evaluated for each phase of the search process separately. That is, given the set of all input characteristics, \(A\subset {\mathcal {A}}\), the objective is to examine if a subset \(A' \in A\) is associated with the target success operators, corresponding to each search phase. The assumption made here is based on whether feature membership for \(A'\) is consistent. This in turn can be used to indicate the features that are most prevalent at predicting success operators for each search phase, and if comparable across the two different optimisation problems. In the first test, which is depicted in Fig. 4 for the One-Max problem and Fig. 5 for the SUKP, the strength of the linear relationship between input features with respect to each search phase was assessed.

Fig. 4
figure 4

Pearson correlation coefficient matrix for the features applied to One-Max problem. The matrices are ordered top-down per search phase; top is earlier and bottom is the final stage

Fig. 5
figure 5

Pearson correlation coefficient matrix for the features applied to SUKP problem. The matrices are ordered top-down per search phase

There is clearly apparent linearity – as additionally expected, both positive and negative – among different groups of features in both optimisation problems. Furthermore, the strength of relationship exhibits variability across the different search phases. In general, although relative strength of association can serve as a guide for feature selection processes, further analysis and evaluation of feature importance in relation to operator selection is still necessary. In particular, we test whether the selected subset of features can accurately learn the target variables, i.e., success operators, associated with each optimisation problem where membership in \(A^\prime\) can be relatively stable between the two optimisation problems.

Accordingly, for both the One-Max and SUKP problems, the Chi-square (\(\chi ^2\)) test – a test on whether two variables are related or independent from one another– is conducted to examine the dependency of the response variable (success operator) on the set of input features. \(\chi ^2\) statistic, computed for each pair of feature classes, provides a score on the relative dependency between the values of each attribute and the different target classes. In order to predict the target class, i.e., the search operator, attributes with higher values for the statistics, \(\chi ^2\), can be considered as highly impactful. As a result, these attributes are typically chosen as input features in classifying the operators.

The resulted ranking with \(\chi ^2\) for input features related to the both optimisation problems (i.e., OneMax and SUKP) is shown in Fig. 6, where the top bar chart is plotted with OneMax results and the bottom is with SUKP. The bar chart includes three bars with colour code labelled as 1.0, 2.0 and 3.0 representing three phases of the search process, which are the early, the middle, and late phases, respectively. The level of difficulty increases significantly through the labels across the stages. These labels are applied to all the figures, Figs. 67 and 8.

Although these appear to differ in importance between the two problems – specifically, it seems that SUKP has more significant features than One-Max – there is still an interesting overlap between the two in terms of a subset of (dominant) input features {\(\dot{{\textbf{a}}}_{2}, \dot{{\textbf{a}}}_{4}, \dot{{\textbf{a}}}_{8,j}\)} – labelled in the figures with idp, ifp, osr – as well as an agreement on the relative irrelevance of further features to search operators. This additionally persists across the three search phases corresponding to both the problems under consideration. It is interesting to see that the ranks of the early phase of the search are more apparent than the others, which indicates the significance of the features across the phases. Although such finding can result primitive – not the least conclusive given the nature of the examined problems – the resulted similarity can nonetheless be critical to examining potential prospects leading to learning a solution path (or important features) from one problem to another.

Fig. 6
figure 6

Ranking with \(\chi ^2\) for input features on successful search operators. Again, in both top (OneMax) and bottom (SUKP), ranking is ordered top-down per search phase

4.3 Building adaptive selection scheme with supervised learning

The third phase, as described in Fig. 1, is the process in which an adoptive data-driven operator selection scheme is developed via supervised machine learning approaches. As seen in Fig. 2, the data is prepossessed, labelled, and used for training and testing purposes towards building up an adaptive scheme. These methods, which use classifiers based on the MLP, SVM, and RF algorithms, have been used for feature analysis as well as for developing an adaptable scheme. Operator selection and related supervised machine learning algorithms will be introduced in the following subsections, followed by discussions of each algorithm’s performance. To further compare the performances, the problem instances of the two chosen problems-OneMax and SUKP-will then be solved.

4.3.1 Operator selection/classification

Feature analysis helps derive insights into the impact of each feature in order to fine-tune the ML models for efficiency purpose. To assess the possible transferability of selected features from one search domain to another, the prediction of the different success operators at each search phase corresponding to the two different optimisation problems is subsequently evaluated. The success of operators relative to each search problem and each phase are shown in Table 3. This provides the setting for a supervised classification task in which problem features are the independent variables and the corresponding success operators are the target class.

Table 3 Success of operators for One-Max and SUKP search problems

In this regard, the success operators are predicted using three supervised classifiers; a Multi-layer Perceptron (MLP) with one hidden layer (feed-forward ANN with ’adam’ solver), Support Vector Machine (SVM) classifier with radial basis function (RBF) kernel, and a Random Forest (RF) classifiers of size 200. All the models have been used in classification tasks very widely for decades, and the particular choice for RF and SVM was additionally due to their ability to provide explicit feature importance ranking alongside their prediction, which we aim to utilise in the proposed hypothesis. We report the accuracy score as the prediction measure of accuracy in Table 4.

Table 4 The accuracy results for both problem types achieved by machine learning approaches across 3 phases

Interestingly, the performance of the classifiers on both optimisation problems is relatively comparable. With the exception of SVM on One-Max, which seems to be underperforming than it does on SUKP, the predictability of success operators from both individual as well as population domain features is consistent. It should be noted that the reported performance of the three classifiers can be tuned for further optimisation, which we aim at providing in a further study. In this study, however, the aim is to examine whether the predictability of success operators can be achieved with a subset of input features learnt in different search problem(s). In such a way, the relative importance of input features for the classification tasks is computed and compared; the weighted coefficients of feature vectors in the SVM classifier, as well as the importance of features from the resulted Random Forest classifier, normalised across the 200 Decision Trees between 0 and 1. The results are shown in Fig. 7 for the One-Max problem and Fig. 8 for SUKP.

Fig. 7
figure 7

Feature importance ranking for One-Max problem

Fig. 8
figure 8

Feature importance ranking for SUKP problem

Once again, the results show promising findings, as a subset of features can be seen to have similar relative importance across both search problems. In fact, this emphasises the suggestion, as observed earlier in the Chi-sqaure test results, that there seems to be a subset of effective features, like \(A^\prime\), to the task of operator selection that can be transferable from one problem to another. It is worth mentioning that in both Figs. 7 and 8, the relative feature importance is computed for the whole set of features, as the SVM considers weighing all input attributes, and the RF calculates class impurity – relative Shannon entropy – weighted by the probability of reaching the target class (success operator) corresponding to all features as these are re-sampled across 200 trees, and subsequently their scores normalised. That is to say, when selecting the subset of effective features, their relative importance should be considered instead of the values assigned to them.

The assessment on what specific features are most prevalent to the success operator selection, and why can be ’overenthusiastic’ at this stage, especially so as this would require extensive characterisation of both search problems, which will be evaluated further in a later study. Here, however, the argument on finding a transferable \(A^\prime\) from one search problem to another seems plausible. For this, the extent of predictability (solution quality) and robustness as features are reduced and transferred across different search domains should be examined further.

4.3.2 Comparative results

Further to analysis and discussions provided over feature analysis and operator selection performances, this subsection is to dive down into comparative results of solving the instances of both One-Max and SUKP problems. Tables 5 and 6 present the details of the comparative results produced with all three supervised ML algorithms and random selection for solving One-Max instances by using the full set of features and selected 10 most impactful features, respectively. The results are shown in both tables using "Max" and "Mean" metrics, which make it easy to observe how significantly each algorithm improved against "Random Selection" in each case. The "Rank" column shows the relative performance in ranking order, with "Random Selection" appearing to outperform ML algorithms over the first 3-4 easier and smaller instances, but ML algorithms certainly outperform them over all remaining larger instances. In Tables 5 and 6, RF appears to be outperforming all with a consistent lowest rank of 1.95 and 1.324. Meanwhile, SVM appears to be performing better than MLP in both cases.

Table 5 Comparative results by supervised machine learning algorithms to build adaptive operator selection scheme for solving OneMax problem instances
Table 6 Comparison of ML models with 10-features for OneMax Problem

For a particular instance of the One-Max problem with 5000 dimensions, Fig. 9 plots the convergence data obtained from each of the six variants of ABC – embedded with operator selection schemes built with the variants of ML algorithms under consideration and taking either the full set or selected set of features. It appears that the convergence index, which is the value of the objective function as the quality of solution, starts drifting away around iteration 50, and carries on accordingly. The last 50 iterations are zoomed in the subplot in order to observe the differences more clearly. As seen, convergence appears higher with feature selection (FS) variants in comparison to the full set featured ones; RF, SVC, and MLP models have converged slower than their FS variants. Among all the three ML algorithms, RF remains as the fastest model in convergence speed, while the models with the selected 10 most impactful features help increase the performance. Although MLP-FS is not competitive enough with RF-FS and SVR-FS, it gains improvement in producing better solutions.

Fig. 9
figure 9

Convergence graph on 5000 dimension OneMax problem

The comparative results for SUKP problem instances are provided in Tables 7 and 8 with full set of features and selected 10 features based on impacts identified through feature analysis, respectfully. Both tables present the results in two metrics alongside the ranks within the competitors; "Max" and "Mean". The ranks are calculated with respect to "Mean" measures, where all algorithms, i.e., RF, SVM and MLP, remain comparative with slight differences in performance. SVM performs clearly worse than RF and MLP in both cases; with and without feature selection, while MLP does better than RF, but RF-FS has higher average rank than MLP-FS. This suggests that feature selection contributes to RF more than does to MLP despite the average rank score is slightly different.

Table 7 Comparative results ML models with full set of features.for solving SUKP instances
Table 8 Comparative results of ML models with the set of selected features for solving SUKP instances
Fig. 10
figure 10

Comparative results by RF-FS with the state-of-art approaches for SUKP problem instances

Figure 10 plots the comparative results collected from the state-of-the-art approaches and RF-FS as the winner of this study. The results by GA (Schmitt 2001) and binDE (Engelbrecht and Pampara 2007) have been taken from the results tabulated in BABC (He et al. 2018), while GPSO are taken from Ozsoydan and Baykasoglu (2019). The performance indicator is on the vertical axis, while the instances are on the horizontal one. It is apparent that RF-FS sits on the top of all scattered graphs as seen in blue label. The results by RF and RF-FS for SUKP problem instances remain comparable with some other recent state-of-the-art works such as using Reinforcement Learning (RL) (Durgut and Aydin 2021; Durgut et al. 2022) for this purposes applying online learning policies. Since the set of operators are not the same, a direct comparison would not be fair, but, it seems the results look at least comparable.

5 Conclusions and future work

Development of an adaptive operator selection remains challenging, where a successfully developed scheme can be used to select the best operator given the search space and neighbourhood circumstances, this research has introduced an exploratory study to investigate a number of supervised machine learning algorithms. A predictive analysis has been applied in order to reveal the impact of the identified features and their domination if the full set of features or a selected subset to be used for characterisation of search space and the problem states for optimisation purposes. The idea is to identify the set of most impactful and prominent features that best represent a problem state and its standing within its neighbourhood so that the best fitting neighbourhood function, i.e., operator, among many alternatives can be selected to generate the next problem state avoiding local optima for higher efficiency in search process. A swarm intelligence algorithm – Artificial Bee Colony – has been used with a pool of neighbourhood functions, i.e., operators, to solve two different types of combinatorial optimisation problems utilising an adaptive operator selection scheme. The set of most prominent features are elicited through a rank of weights using statistical and machine learning methods. The analysis demonstrated that a set of features mostly including individual features are found to be more discriminating than those of population-based metrics.

The research has shown that supervised machine learning techniques, such as Random Forest, Support Vector Machine, and Multi-layer Perceptron, are very useful in developing adaptive operator selections. For both combinatorial problems Random Forest provided the best results. However, the runner approach varies; SVM performs better than MLP in One-Max but worse in SUKP. The validity of the results has been verified by comparing the winner approach’s success with the state-of-the approaches.

The interesting preliminary finding of the study is that even though the problem domain has changed, the most effective features have largely remained the same. This suggests that the information can be transferable between different problem domains. There are a number of interesting directions in this area for future research. For example, the success of transfer learning through the problems needs to be investigated in terms of robustness and solution quality. For dynamic and more realistic problems, the set of features and more data will be used considering Deep Learning techniques as well as active and Reinforcement Learning.