1 Introduction

Parameter setting is an important area of research in the evolutionary computation field. Since an a-priori identification of the optimal configuration of the parameters is always time-consuming and often impractical, one must employ a dynamic selection strategy of the optimal configuration which is performed while the search is being executed. In addition, a static set of parameters is not always the optimal choice for a large number of problems where self-adapting techniques have proven to be more effective (Eiben et al. 1999).

The problem of identifying the most suitable variation operator among several, also known as adaptive operator selection (AOS), can be divided into two sub-tasks: the credit assignment (CA) mechanism, used to evaluate the performance of the operators; and the operator selection (OS) rule, necessary to determine the most suitable operator using the information provided by the CA mechanism. The majority of the credit assignment approaches in literature are based on the evaluation of the fitness of the offspring generated by the operator, which is compared either to the current best solution (Davis 1989), to the median fitness (Julstrom 1995) or to the parents’ fitness (Barbosa and Sá 2000). A different strategy evaluating both fitness and diversity of the offspring was proposed in Maturana and Saubion (2008). The reward has been mostly considered as the value assessed during the last evaluation (instantaneous reward), as the average reward over a window of the last N evaluations (average reward), and as the biggest improvement achieved over a window of the last N evaluations (extreme reward) (Fialho et al. 2009).

The use of alternative metrics has been recently considered in Soria Alcaraz et al. (2014), where an evolvability metric replaces the evaluation of the fitness. A different approach for population-based meta-heuristics, proposed in Consoli and Yao (2014), assesses the reward as the proportion of solutions generated by each operator which have been selected by the ranking phase of the evolutionary algorithm. The credit assignment mechanism is coupled with operator selection rules such as probability matching (Goldberg 1990), adaptive pursuit (Thierens 2005) or multi-armed bandit solvers (MAB) (DaCosta et al. 2008). Several improvements of the MAB strategy have been proposed, as in Belluz et al. (2015), Chen et al. (2013) and Kim et al. (2012). Reinforcement learning has been also used in parameter setting (Karafotias et al. 2015), as in Eiben et al. (2007), where a reinforcement learning procedure is adopted to modify the parameters on-the-fly, or in Sakurai et al. (2010) where the selection probability of the operators is adaptively changed using a reinforcement learning approach.

From the analysis of the existing literature, it is clear that almost all the existing CA strategies rely exclusively on the mere evaluation of the fitness of the offspring. However, the information provided by the fitness at a single generation may not be sufficient to assess the optimality of an operator (e.g., in a landscape with a high degree of neutrality). The purpose of our work is therefore to develop a new dynamic CA mechanism which considers a suite of measures, and that can be adopted also as an operator selection rule. We consider the memetic algorithm with extended neighborhood search (MAENS*) (Consoli and Yao 2014) algorithm as a case study and for comparison purposes. More specifically, we aim to answer the following research questions in our paper:

  • RQ1 What kind of additional information we can provide to the credit assignment technique for a more “aware” calculation of the reward and does this information effectively help to improve the prediction ability of the algorithm?

  • RQ2 What technique would be useful to handle such data and to select the most suitable operator in such a dynamic environment? Would the prediction ability of the technique be better than that of MAENS*? Would the use of this technique improve the optimization ability of MAENS*?

The contributions of our work include:

  • An ensemble of four different online fitness landscape analysis techniques, performed during the execution of the MAENS* algorithm in order to give a more accurate description of the current population (RQ1).

  • A credit assignment technique based on the use of a online learning algorithm to predict the reward of the most suitable operator (RQ2).

  • Two different reward measures are studied: one based on the survival ability of the offspring and another one based on the analysis of their diversity.

This work extends our previous work in Consoli et al. (2014) with new experiments and contributions. In particular: (a) we investigate the use of a novel reward measure called diversity-based reward (DBR); (b) we study the adoption of a different operator selection rule, named concurrent strategy (CS); (c) we extend our analysis by testing our algorithms on a dataset of large CARP instances. The results of the experiments carried out show that the proposed approach is able to produce results with comparable solution quality to a state-of-the-art strategy and reveal how in some cases the presence of a set of measures have a beneficial effect on the optimization ability of the AOS.

The rest of the paper is organized as follows. Section 2 introduces the case scenario and the base MAENS* algorithm. Section 3 describes the novel reward measures and operator selection rules investigated in this work. Section 4 describes the ensemble of fitness landscape techniques used in conjunction with the CA mechanism of the MAENS* algorithm. Section 5 describes the online learning algorithm that has been used and adapted for the CA system. Section 6 presents the proposed MAENS*-II algorithm. Section 7 describes the experiments that have been carried out to verify the assumptions of this research and their results. Finally, the last section includes the conclusions and some future work ideas.

2 Background

To investigate over our research questions, we consider the MAENS* algorithm (Consoli and Yao 2014) for the capacitated arc routing problem (CARP) (Golden and Wong 1981), as the case study of this research, as it already utilizes an adaptive operator selection scenario and provides a term of comparison with alternative techniques. The strong relationship between CARP and specific real-world problems, such as winter gritting, waste collection or postal service make this a problem of great interest for the scientific community, and a large number of heuristics, exact methods and meta-heuristics have been proposed for this problem and its many variants. Although the hyper-heuristic proposed in this work is applied to the capacitated arc routing problem, it would be possible to adapt it to different NP-Hard problems by replacing the low level heuristics and by identifying the best fitness landscape analysis metrics that better describe the specific landscapes of the different NP-Hard problem.

2.1 MAENS*

MAENS*, the case study the for this research, extends the memetic algorithm named MAENS (Tang et al. 2009) introduced in 2009. MAENS is a memetic algorithm which makes use of a crossover operator, a local search combining three local move operators and a novel long move operator called MergeSplit, and a ranking selection procedure called stochastic ranking (SR) (Runarsson and Yao 2000). The major differences between MAENS and MAENS* are: (a) MAENS uses a single crossover operator, whereas MAENS* uses a set of crossover operators, (b) a dynamic MAB mechanism (dMAB) (Fialho et al. 2009) is adopted as an AOS rule, (c) a novel CA mechanism assigns a reward to the operators which is proportional to the number of solutions generated by each operator that “survived” the ranking phase, named proportional reward, (d) the stochastic ranking is improved considering also the diversity of the solutions (dSR) using a (e) novel diversity measure for the CARP search space.

The dMAB (Fialho et al. 2009) approach, adopted in this work, combines the UCB1 algorithm (Auer et al. 2002) with the Page-Hinckley (PH) statistical test (Hinkley 1971) to detect changes in the environment. When the PH test is triggered, the MAB system is restarted and the information gathered in the previous generations is discarded. The MAENS* algorithm represents one case study of our research, as the presence of a suite of crossover operators allows the study of other AOS approaches.

2.2 Capacitated arc routing problem

The capacitated arc routing problem (CARP) can be formally defined as the problem of minimizing the total service cost of a routing plan, given a set T of tasks (which corresponds to a subset of the arcs of a graph) and a fleet of m vehicles with capacity C. Each task t has a service cost sc, a demand d (the load of the vehicle necessary to service the task), a unique id, a reference to its head and tail vertices, and must be served once and entirely within the same route \(\mathbf {R_j}\). A CARP solution S can be represented as

$$\begin{aligned} S= \{\{t_0,t_k,...,t_l,t_0\}, \ldots , \{t_0,t_p,...,t_q,t_0\}\} \end{aligned}$$

which is a permutation of the whole set of tasks, divided into several routes \(\mathbf {R_j}\). Each route must start and end in a specific vertex called depot. We use a dummy task \(t_0\) with null demand and service cost to show the start and the end of a route in the depot. The service cost of a single route is calculated by adding the service cost of all the tasks in the route plus the cost of the shortest path sp between each task. The problem can be formally defined as follows:

$$\begin{aligned} \min \text{ TC }(S)= \sum _{i=1}^{\text{ length }(S)-1}(\mathrm{sc}(t_i)+\mathrm{sp}(t_i,t_{i+1})), \end{aligned}$$

subject to the constraints

$$\begin{aligned} \mathrm{load}(R_j)\le & {} C,\quad \mathrm{app}(t_i)=1 \quad \text{ and }\quad \forall t_i \in T, m \le n_{\mathrm{veh}},\\ \mathrm{load}(R_j)= & {} \sum _{i=1}^{\text{ length }(R_j)}d(t_{ij}), \end{aligned}$$

where \(\mathrm{app}(t_i)\) gives the number of appearances of tasks \(t_i\) in the sequence of the tasks in S and \(n_{\mathrm{veh}}\) is the number of available vehicles.

2.3 Recombination operators for CARP

As explained in Sect. 2.1, MAENS* uses a set of crossover operator, instead of a single crossover operator. This section describes the four crossover operators introduced in MAENS* to deal with the CARP problem.

2.3.1 Greedy sequence-based crossover (GSBX)

The GSBX operator, can be considered as a variant of the sequence based crossover (SBX) operator. In this case, one route is extracted from each parent solution following a greedy strategy that influences the selection towards those routes whose vehicle is still not full (the total demand of the route is the smallest). Once selected, the two routes are recombined using a one point crossover mechanism. The route generated in this way replaces the original routes of the parents to generate the new offspring. Since problems caused by double servicing of tasks or tasks not being serviced might arise, the solution goes through a repair phase to guarantee its feasibility.

2.3.2 Greedy route crossover (GRX)

In GRX, an offspring is created by alternatively copying the routes of the parents into it. Routes are extracted from the parent solutions, giving higher priority to the routes with a higher quality measure. Tasks that have been inserted into the offspring are consequently removed from both parents, to avoid the double servicing of tasks. The procedure is repeated until the remaining routes in the parents have less than a certain amount of tasks. In that case, the remaining tasks are inserted in the existing routes or merged into new ones.

2.3.3 Pivot-based crossover (PBX)

Two routes are randomly selected from the parent solutions. The PBX operator works by identifying, among the tasks belonging to such routes, the one that is most suitable to be placed in the middle of a route, which is named pivot, as it splits the route in two parts. The route is then rebuilt inserting the remaining tasks in the position that minimizes the total service cost of the route. Finally, the offspring is obtained by replacing the original route in one of the two parents. As in the case of GSBX, the solution goes through a repair phase to guarantee the feasibility of the solution.

2.3.4 Shortest path-based crossover (SPBX)

The SPBX operator works analogously as the PBX operator, except that in this case, the pivot is represented by a path between two of the available tasks. The couple of pivoting tasks is selected as the one that the serves the largest number of available tasks along their path and that minimizes the distance between the extremities of the path and the deposit.

3 Adaptive operator selection

As previously mentioned, AOS is conventionally composed of two different sub-tasks: the credit assignment and the operator selection. For the former part, we propose the use of two different reward measures, named proportional reward (PR) and diversity-based reward (DBR). For the latter, we study the performance of two different strategies: a simple instantaneous reward (IR) approach and a concurrent strategy (CS)-based approach.

3.1 Credit assignment

The choice of the proper credit assignment strategy can be fundamental for the performance of the algorithm. As one objective of this work is to evaluate more than just the fitness of the individuals, we adopt two different strategies that involve the evaluation of different measurements. The first one, named proportional reward, was first used in Consoli and Yao (2014) together with a multi-armed bandit approach. For the second case, we develop a novel measure based on the evaluation of the diversity of the offspring, named diversity-based reward.

3.1.1 Proportional reward (PR)

PR (Consoli and Yao 2014) is a measure of the survival ability of the offspring generated by each crossover operator. We assign a reward r, where \(r \in [0,1]\) corresponds to the percentage of the solutions that have survived the selection phase of the algorithm, and are going to become the parent population for its generation. The use of this technique is a way to entrust the algorithm itself for the evaluation of the offspring. In the case of MAENS*, the offspring able to survive the ranking phase are evaluated according to their fitness value, the amount of violation of the constraints and the average pairwise diversity from the other individuals of the population. The performance of the crossover operator is in this case evaluated at the end of the generation: rather than evaluating the individuals as soon as they are generated, the PR evaluates their performance in a longer period of time (e.g., an iteration). The PR can be formally represented with the following formula:

$$\begin{aligned} \mathrm{PR}(i)^t = \frac{|x_i : x_i \in \text{ parent }^{t+1}|}{|\text{ parent }^{t+1}|} \end{aligned}$$

where i refers to the ith operator, \(x_i\) is an individual obtained through the use of operator i, t is the tth generation and parent\(^{t+1}\) is the parent population at the \(t+1\) generation. If more than one operator is used during the same generation, the PR can be calculated in the following way:

$$\begin{aligned} \mathrm{PR}(i)^t = \frac{|x_i: x_i \in \text{ parent }^{t+1}|}{|\text{ offspring }^{t}_i|} \end{aligned}$$

where \(\text{ offspring }^{t}_i\) is the set of individuals generated using the operator i during the tth generation.

3.1.2 Diversity-based reward (DBR)

In the case of the DBR, we propose an approach that is opposite to that of the PR, as we evaluate the crossover offspring as soon as they have been generated. As one purpose of the crossover operator is that of introducing diversity in the population through the exploration of new areas of the landscape, we adopt a measure of the diversity introduced by the offspring. In particular, for each operator, we want to measure how distant the offspring are from the parent population, and how wide is the area explored. Therefore, we define a parent distance measure

$$\begin{aligned} P_d(x) = \frac{d(x,p_1)+d(x,p_2)}{2} \end{aligned}$$

as the average distance from the offspring x to its parents \(p_1\) and \(p_2\) and we can consequently compute the average parent distance for operator i, \(P_d(i)\), by averaging the \(P_d(x)\) of all the offspring generated by such operator. To measure the distance between individuals, we adopt the distance measure for CARP developed in Consoli and Yao (2014). The pseudocode of the distance measure is shown in Fig. 1. Since a CARP solution is represented by a sequence of tasks t, split into different routes, we can define \(p_i(t)\) and \(n_i(t)\) as two functions that return, respectively, the previous and the next tasks of task t in the sequence of solution \(S_i\). A task t has a perfect correspondence in both solutions if its previous and next tasks match. In the most extreme cases, for two solutions \(S_1\) and \(S_2\), the value of the distance measure will be equal to 1 if \( p_1(t) = p_2(t) \) and \( n_1(t) = n_2(t), \forall t\), and will be equal to 0 if (\( p_1(t) \ne p_2(t) \) and \( n_1(t) \ne n_2(t), \forall t\)). In the former case, the two solutions are identical, as there is a full correspondence between the \(p_i(t)\) and the \(n_i(t)\) of both solutions for each task t, while in the latter case the two solutions are completely different. It is important to point out that while the order in which the tasks in each route are serviced is considered, the order of the routes is not. Therefore two solutions are still identical if they have perfectly corresponding routes even if permutated in a different order. In all the other cases, when the correspondence is partial, the diversity measure will consequently assume values within the range [0, 1].

Fig. 1
figure 1

Diversity measure for CARP solutions

We also define the coverage measure of the operator i

$$\begin{aligned} C_m(i) = \frac{\sum _i\sum _j d(x_a,x_b)}{N_i^2} \end{aligned}$$

as the pairwise average distance between any pair of individuals \(x_a\) and \(x_b\) that have been generated by it, where \(N_i\) is the number of individuals generated through the use of operator i. We can compute the DBR of the ith operator in the following way:

$$\begin{aligned} \mathrm{DBR}(i) = P_d(i)*C_m(i). \end{aligned}$$

Similarly to compass (Maturana and Saubion 2008), this credit assignment technique considers the diversity of the offspring as a criterion to evaluate the performance of the operators. However, there a several differences between such approaches. First, the compass approach addresses the evaluation of both the fitness and the diversity while DBR only considers the diversity, being focused on the evaluation of crossover operators exclusively. Secondly, compass makes use of the Hamming distance entropy as in Lardeux et al. (2006) to measure the population diversity, while DBR deals with both the average pairwise distance of the offspring as well as the distance from the parent population using the CARP based diversity measure shown in Fig. 1.

3.2 Operator selection rule (OSR)

The second step of the AOS process is the operator selection rule. The OSR, given the information gained through the use of the credit assignment mechanism, needs to decide what is the most suitable operator and how to use it. A first problem in this context is that of balancing the exploration of all the operators against the exploitation of the most useful one. In other words, while using the operator that has performed the best so far, one wants to verify whether there is another operator that can do better. A second aspect is that of identifying changes during the execution. As the search goes on, the operator that has performed the best so far might not necessarily be the best one afterwards. It is therefore necessary to balance how much of the “history” relative to each operator one most consider to perform the selection.

In this work, we consider two different approaches for the OSR, namely a single operator-based approach named instantaneous reward and a reinforcement learning-inspired one called concurrent strategy.

3.2.1 Instantaneous reward (IR)

In the IR approach, the offspring is produced through the use of only one crossover operator per generation. As offspring and parent populations are merged in an unique population, it is still possible to evaluate all the crossover operators who have generated a solution that is still present in the population. The operator to use in the next generation (\(t+1\)) is consequently selected as the operator \(op_i\) that has obtained the largest reward in the current generation (t):

$$\begin{aligned} \mathrm{op}_i^{t+1} = \max _i ( \mathrm{RW}(\mathrm{op}_i)^t), \mathrm{op}_i \in \text{ operators } \end{aligned}$$

given \(\mathrm{RW}()\) as a reward measure. Those operators having produced more “extreme” improvements (e.g., discovered new optima) with respect to the others, will have a more favourable evaluation that will last for more generations, even when they have not been selected for the current generation.

The information relative to the previous performances of the operator, except for the last iteration, is discarded. IR is, therefore, designed to be more sensitive to changes, having a bias on the performance of the operators during the previous generation. Finally, the adoption of such approach has the potential risk of eliminating completely an operator from the competition if none of its offspring are present in the current population.

3.2.2 Concurrent strategy (CS)

One of the disadvantages of adopting the instantaneous reward strategy is that it is not possible to identify changes in the environment when only one operator is used. A different approach, therefore, is that of allowing the use of all the operators during all the generations. Such approach, named CS, aims to maximize the gain obtained by using the best performing operator, and thus allowing the generation of a larger fraction of the offspring by it, while the remaining part is still generated by the other operators. The CS is similar in its behaviour to the adaptive pursuit (AP) approach (Thierens 2005) in its intent to maintain a minimum percentage of the solutions to be generated by the less performing operators. The formula to assign the Selection Rate to each operator i, is the following:

$$\begin{aligned} \mathrm{SR}_i = \mathrm{SR}_{\min } + (1-n \times \mathrm{SR}_{\min }) \dfrac{e^{\mathrm{RW}(op_i)^t/\psi }}{\sum _{j=1}^n e^{\mathrm{RW}(\mathrm{op}_j)^t/\psi }} \end{aligned}$$

where \(\mathrm{SR}_{\min }\) is the minimum selection rate, n is the number of operators, \(\mathrm{RW}(\mathrm{op}_i)\) is the reward calculated for the operator \(\mathrm{op}_i\) during the generation t, and \(\psi \) is a control parameter that regulates how quickly the system reacts to the changes in the environment. In this case, \(n=4\) since four operators are available.

4 Online fitness landscape analysis

The existing fitness landscape analysis (FLA) techniques have been analysed with the purpose to identify those that can be used in the CARP context. Such selection has been driven by both the necessity to reduce the computational effort by exploiting some calculations that are already performed by the algorithm, and the necessity to identify measures able to “capture” different features of the landscape. We identified four FLA techniques, consisting of one evolvability measure, two neutrality measures and one fitness distribution measure, to describe different features of the landscape and without much increasing the computational effort. The computation of such techniques is based on the evaluation of the neighbourhood of each solution. Such neighbourhood is already generated through the initial iteration of the local search operator of the MAENS algorithm, using the three different move operators involved in this process (single insertion, double insertion, swap insertion). The FLA techniques are employed during each generation, and their results are used as input features of an online learning algorithm to predict the value of one of the two reward measures introduced in Sect. 3.1, to create a more accurate and informative “snapshot” of the current population which eventually might lead to a better selection of the crossover operator. A final remark is necessary about the constraints handling and how it affects the fitness of the individuals. The landscape in which MAENS* operates is that of solutions which may potentially violate the capacity constraints of the vehicles. Therefore, we consider the following fitness function, adopted from Tang et al. (2009):

$$\begin{aligned} f(S) = \mathrm{TC}(S) + \lambda *\mathrm{TV}(S) \end{aligned}$$

where \(\lambda \) is an adaptive parameter depending on the cost, on the violation and on the best feasible solution found so far, \(\mathrm{TC}(S)\) is the total cost of the solution and \(\mathrm{TV}(S)\) its total violation.

The rest of this section will introduce the four FLA techniques that have been considered in this work and how they are integrated in the MAENS* algorithm.

4.1 Accumulated escape probability

The accumulated escape probability (Lu et al. 2011) is a metric that aims to measure the evolvability, which can be defined as the capacity of the solutions to evolve into better solutions. The accumulated escape probability is obtained by averaging the mean escape rate (Merz 2004) (the proportion of solutions with equal or better fitness in the neighbourhood) of each fitness level with the formula:

$$\begin{aligned} \mathrm{aep} = \frac{\sum _{f_i \in F}P_j}{|F|},\quad \text{ where } F={f_0,f_1,...,f_L} \end{aligned}$$

where \(f_i\) is a fitness level (subset of all the solutions with fitness equal to a value \(f_i\)), \(P_j\) is the average escape rate of all samples belonging to the \(f_j\) fitness level and L is the number of possible fitness levels. Being the mean value of a set of probabilities, the aep will be 0 when the instance is hard and higher (up to 1) otherwise. The calculation of the aep requires the analysis of the neighbourhood of each solution in order to identify how many individuals have a equal or better fitness than the original individual. We analyse, therefore, the evolvability of the solutions which have been selected (with probability equal to 0.2) for the local search. Since the calculation of the neighbourhood of each solution corresponds to the first step of the local search, no significant additional cost is required to compute the aep.

4.2 Dispersion metric

The analysis of the distribution of the solutions within the landscape can be sometimes used to understand more about the difficulty that a “jump” between fitness levels requires and to gain some information on the global structure of the landscape. In this context, the dispersion metric (dm) (Lunacek and Whitley 2006) is a technique to obtain information about the global structure of the landscape, by measuring the dispersion of good solutions. Ideally, if good solutions are very close, we might have a single funnel structure. If, on the contrary, solutions get more distant when their fitness improves, the landscape might be more like a multi-funnel structure. The analysis can be described as follows:

  1. 1.

    A sample S of solutions is taken from the search space;

  2. 2.

    the best \(S_{{best}}\) solutions are selected from S (using a threshold value);

  3. 3.

    the average pairwise distances in S (\(\overline{d}(S)\)) and in \(S_{{best}} \)(\(\overline{d}(S_{{best}})\)) are calculated using the CARP diversity measure shown in Fig. 1;

  4. 4.

    the dm is obtained as the difference between \(\overline{d}(S_{{best}})\) and \(\overline{d}(S)\).

The calculation of the pairwise distance between all the individuals of the sample is already performed during the diversity-based stochastic ranking of MAENS* by using the distance measure shown in Fig. 1, and therefore requires no additional cost. Thus, the dm can computed on the set of all the popsize*offset individuals created during each generation of MAENS*. Finally, it is possible to rely on the ranking performed by the diversity-based stochastic ranking operator and choose these solutions as the subset of the best ones.

4.3 Average neutrality ratio and \(\Delta \)-fitness

Neutrality is the study of the width, distribution and frequency of neutral structures within a landscape (e.g., plateaus, ridges). A set of several neutrality measures was defined in Vanneschi et al. (2006). Among these, we select the following two:

  1. 1.

    average neutrality ratio (\(\overline{r}\)): can be obtained by averaging the neutrality ratio (e.g., the number of solutions with equal fitness) of each individual with respect to its neighbourhood;

  2. 2.

    average \(\Delta \)-fitness of neutral network (\(\Delta (\overline{f})\)): can be defined as the average fitness gain after one mutation step of each individual belonging to a neutral network.

In the same fashion as in the case of the aep, the computational effort of this technique can be absorbed by the generation of the neighbourhood of the initial solution during the local search.

5 Online learning

The AOS model followed by MAENS* is that of the multi-armed bandit approach, where the UCB1 (Auer et al. 2002) algorithm is used to balance the exploration and exploitation of the crossover operators and the Page-Hinckley (Hinkley 1971) test is used to detect when a different operator has become the most suitable.

In this work, we propose the adoption of a different model. The abrupt and scarcely predictable changes of the most suitable operator which might happen during the search show many similarities to the notion of concept drift (Schlimmer and Granger 1986; Minku et al. 2010) in machine learning. Thus, in such a context, we might adopt an online learning algorithm capable of (a) predicting a reward for each operator using the online fitness landscape analysis measures and (b) tracking the changes of the environment, relying only on a limited number of training instances. We can define more formally the learning problem in the following way. At a given generation of the EA, we compute the FLA metrics (\(fla_1,fla_2,fla_3,fla_4\)) and the reward of each operator (\(\mathrm{RW}(\mathrm{op}_i)\)). Tuples (\(fla_1,fla_2,fla_3,fla_4, \mathrm{RW}(op_i)\)) are then used as training examples for the online learning algorithm, where (\(fla_1,fla_2,fla_3,fla_4\)) are the input features and (\(\mathrm{RW}(\mathrm{op}_i)\)) is the target output.

We employ the dynamic weighted majority (DWM) (Kolter and Maloof 2003) algorithm as our online learning algorithm, which has proved to be one of the most effective techniques in the task of tracking concept drifts. The DWM algorithm can be described as follows. A set of learners (called experts) are used to classify the incoming instances \(\{\overrightarrow{x},y\}\), where \(\overrightarrow{x}\) is the vector of n input features and y is the output feature. Each expert \(e_j\) has its own weight \(w_j\), and operates a classification \(\lambda \) of the instance. The global prediction is identified as the prediction with the largest sum of weights. All the experts which have failed to classify correctly the instance have their weights reduced by a \(\beta \) factor. Moreover, for every p instances, all the experts with a weight below a certain threshold \(\theta \), are deleted and a new expert is created if the global prediction is wrong.

5.1 DWM for regression tasks

As the DWM algorithm was originally conceived for classification it is necessary to adapt and modify some of its mechanism for the regression task of predicting the reward of a given operator based on the FLA techniques. A comparison between the revised DWM algorithm for the regression task (rDWM) and the original DWM itself is given in Fig. 2. The modifications introduced are:

  1. 1.

    The global prediction \(\sigma _i\) is obtained by calculating the weighted average of all predictions (line 10);

  2. 2.

    we consider a prediction correct if its difference from the output feature is less than a threshold \(\tau \) (lines 5–6);

  3. 3.

    a new expert is created if the difference between the global prediction and the output feature is less than a t factor (lines 17, 18);

  4. 4.

    we introduce a window containing the last n instances wTS, which is used to train the new experts upon creation (line 2).

Fig. 2
figure 2

Pseudocode of DWM (left side) and DWM for the regression task (right side). The novelties introduced in the latter version, discussed in Sect. 5.1, are highlighted with a gray background color

6 MAENS*-II

The revised version of the algorithm adopting the rDWM as an AOS mechanism, named MAENS*-II, is shown in Fig. 3, along with the original MAENS* algorithm. Further information about MAENS* can be found in Consoli and Yao (2014). A set of four (one for each crossover operator) rDWM instances are created upon initialization of the algorithm (line 2). During each generation, one new training example is created for each rDWM instance by using the current set of FLA metrics as input features, and the reward associated to the operator as the output feature (lines 10, 13, 14) obtained with a given credit reward strategy. The set of four rDWM instances are then used to predict the reward of each operator (line 4). Finally, an operator selection rule is adopted to choose the operators to use during each generation.

Three different versions of the MAENS*-II algorithm were implemented employing the two different techniques for the operator selection rule introduced in Sect. 3.2 as well as the two different credit assignment mechanism presented in Sect. 3.1. All the experiments were performed using the weka (Hall et al. 2009) implementation of REPTrees as base learners. Table 1 summarizes a list of the different versions of the algorithm and a description of their components. It is worth noting that the combination of the DBR strategy and the instantaneous reward was not considered, as the strategy of measuring the reward of the crossover offspring and the use of only one operator during each generation would lead to the its exclusive use for the whole execution of the algorithm.

Fig. 3
figure 3

Pseudocode of MAENS* (right side) and MAENS*-II (left side). The novelties introduced in the latter version, discussed in Sect. 6, are highlighted with a gray background color

6.1 Improvements on local search efficiency

One of the most effective features of MAENS (Tang et al. 2009) is its local search, which, however, has a high computational cost—the algorithm spends around \(95\,\%\) of its runtime performing this operation. Although the proposed modifications to the original MAENS algorithm, as explained in Sect. 4, cause no significant increase of the runtime of the algorithm, a fast implementation of MAENS local search is introduced, which helped reducing effectively the runtime without incurring into extra memory consumption. The approach is similar to the one introduced in Zachariadis and Kiranoudis (2010) for the vehicle routing problem, but without relying on the use of memory.

Table 1 List of algorithms

The approach can be summarized by the following points:

  1. 1.

    Every individual a in the neighbourhood of a solution x is represented as a move M, where M stores the information relative to the move operator \(\mathrm{op}_i\) such that \(\mathrm{op}_i(x)=a\), the tasks involved in the move, and the variations in terms of fitness and violation of the constraints w.r.t. the values of the initial solution.

  2. 2.

    The set of moves representing the whole neighbourhood is split into \(V=R*R\) subsets, where R is the number of routes of the initial solution. Each subset contains the moves relative to the move operators applied to the tasks belonging to the routes \(R_i\) and \(R_j\).

  3. 3.

    During the first iteration of the local search, the whole neighbourhood of moves is produced. A storage array of size V is kept to store the best solution of each subset.

  4. 4.

    The best move in the neighbourhood is identified with a computational time of O(M). If the best move belongs to the subset relative to the routes \(R_i\) and \(R_j\), the positions in the storage relative to the combination of either routes are set to null.

  5. 5.

    In the following iterations, the local search produces only the moves involving either the route \(R_i\) or \(R_j\) or both. The positions of the storage relative to such moves are consequently updated.

After the first iteration, the number of subsets to be evaluated is therefore decreased from \(R^2\) to \(2R+1\), resulting in a significant reduction in terms of size of the neighbourhood that is necessary to evaluate during each local search iteration. It is worth mentioning that the use of the move notation itself reduces the cost of evaluating the fitness and the violation of one individual from O(n) to O(k), where n is the number of tasks and k is equal either to 7 or 8 (depending on the move operator considered).

Table 2 Characteristics of the instances of Groups A (top part) and B (bottom part)
Table 3 Parameters of the MAENS*-II algorithms

7 Experimental studies

A set of experiments were designed to understand the behaviour of MAENS*-II. As a first step, an oracle based on the Proportional Reward was implemented with the purpose of analysing a set of CARP instances in order to obtain optimal crossover operator selection rates and to analyze them. The oracle can be briefly described as follows. Four different populations are obtained during each generation by using each crossover operator. All the individuals of the four generations are merged into a single population which is sorted using the MAENS* ranking operator. The proportional reward mechanism is therefore used to assess the best operator. The results achieved by the oracle show that the predictions operated by the dMAB are not optimal, as better results can be achieved. Besides, the results of the oracle should be considered “optimal” only when the proportional reward strategy is considered, because they might not necessarily be optimal when in presence of a set of multiple measures, as in the case of MAENS*-II, or when a different credit assignment strategy is considered.

Two different datasets are considered for the experiments. The first one, named Group A, is composed of instances taken from the known benchmark test sets egl (Eglese 1994), Beullen’s C, D, E, F (Beullens et al. 2003) and val (Benavent et al. 1992). The second group (Group B) corresponds to the large scale CARP instances of the dataset EGL-G (Brandão and Eglese 2008). The characteristics relative to each instance, in terms of number of vertices, number of edges, number of required edges and best fitness value found in literature are included in Table 2.

We provide an example of a solution for the D07 problem instance produced by one of the variants of the MAENS*-II algorithm in Fig. 4, to clarify what kind of results this algorithm produces.

Table 4 Statistics relative to the performances of the four crossover operators for datasets A and B

The set of parameters adopted in all the MAENS*-II algorithm variants, included in Table 3, were identified by a series of test-and-trial attempts and might not correspond to the most optimal choice. With regards to the parameters that are common with the other algorithms (MAENS and MAENS*) we adopted the same set of values in order to exclude different results due to different parameter configurations. These parameters can be identified in Table 3. All the final results were obtained by averaging the output of 30 independent runs.

7.1 Single operator scenario

In order to understand the improvement achievable by MAENS*-II, the algorithm was executed on the two benchmark sets considering each of the four available crossover operators. The results of such experiments for Group A are included in Table 5. For each single operator MAENS* version, the results show the average fitness over 30 independent runs, the standard deviation and the fitness of the best individual found. The last column, named best, shows the results of what an “optimal” adaptive operator selection would achieve (picking the best results out of the four achieved). In the second to last row (named \(\#\)), the table provides the number of instances with statistically different results according to the results of a Wilcoxon rank-sum test with Holm–Bonferroni correction at the significance level of 0.05. The row at the bottom (named W) shows the number of comparison won against the other algorithms. A Wilcoxon rank-sum test was performed on the results achieved on every instance by each pair of algorithms, with Holm–Bonferroni correction to deal with the multiple comparisons. The results across all the problem instances were then compared using the Wilcoxon signed-rank test. Each problem instance with comparable results was treated as paired results and therefore omitted from the test. The results of such test, subject to the Holm–Bonferroni correction, are included at the bottom row (pBest) of Table 5.

Table 5 Experimental results on Group A using alternatively each crossover operator
Table 6 Experimental results on Group B using alternatively each crossover operator
Table 7 Experimental results on the instances of Group A relative to MAENS*, MAENS*-IIrw, the oracle, and MAENS* with random selection

For Group A, it is possible to notice how there is a great number of instances for which the four versions achieve statistically different results. The only exception is represented by the comparison between the PBX- and the SPBX-based versions, for which there is a limited number of statistically different instances (7); in all the other cases, the statistically different instances are at least 24. The GSBX-based operator seem to be the one performing the worst, losing the comparison to most of the instances, while the other three operators achieve the best results in a similar number of times. The statistical difference between the results of the GSBX version and the other three versions is also confirmed by the results of the Wilcoxon signed-rank test. None of the four single operator-based versions of MAENS* algorithm is able to perform as good as the “optimal” results (in the best column), as testified by the results of the Wilcoxon signed-rank test included in the pBest row in Table 5.

The results of the comparison for the CARP instances belonging to the Group B are included in Table 6. The table shows the results of the four different versions of the algorithm, based on the use of one of the four crossover operators available. Analogously to the previous, the table presents the average fitness (best), the standard deviation (std) and the best result found (best) for each algorithm. The results show how the best results are achieved always by the PBX- and SPBX-based versions of the algorithm (providing better results on all statistically different instances against the other two versions). The GRX-based version, in contrast, is the one that performs the worst (losing the comparison five times out of six against the GSBX-based version and on all the statistically different instances for the other two versions).

In both datasets, SPBX and PBX operators appear to be the operators whose usage leads to the best results. This can be explained by the fact that such operators can introduce a fair amount of diversity in the offspring as one or more routes are built from scratch. On the other hand, they maintain the good traits of the parents copying the routes that are not affected by the recombination. The GSBX operator, in contrast, might not introduce much diversity in the offspring as the new routes are a combination of the subroutes of the parents. Therefore, despite being the least disruptive operator, on the long term it produces a minor contribution than SPBX and PBX. The GRX operator, on the other hand, has a larger disruptive capacity as only the best routes are preserved in the offspring. In the context of large instances with a great number of routes as in the case of dataset B, therefore, this operator might introduce an excessive level of exploration and consequently perform worse than the others.

A further experiment was conducted to analyse the behaviour of the four crossover operators. A population of 10,000 solutions was generated using the initialization operator. Each operator was then used to generate a population of 10,000 solutions, using a random parent selection mechanism. Table 4 reports the number of instances for which the operators achieved the worst results in terms of fitness, violation, average pairwise distance of the offspring population and average distance from the parents. This experiment has been repeated for both datasets. The results show that in dataset A the operator GSBX achieves the worst results in the largest number of instances for each of the characteristics analysed. This is coherent with the results achieved by the four evolutionary algorithms. On the other end, for dataset B, GRX is the worst algorithm for both fitness and violation and the second worst for both the diversity measures, which reflects the behaviour of the algorithm. It is however worth specifying that these results refer to the behaviour of the operators with a population of low quality solutions (as they are generated through the use of the initialization operator) and might not necessarily reflect the behaviour of the crossover operators during the most advanced phases of the search.

Table 8 Experimental results on the instances of Group A for MAENS*-IIa, MAENS*-IIb and MAENS*-IIc
Table 9 Experimental results on the instances of Group B for MAENS*-IIa and MAENS*-IIb

7.2 Operator selection rules and reward measures: a comparison

The performance of the algorithm using different operator selection rules and reward measures is shown in Tables 8 and 9, respectively, for the groups A and B. We include the results of the three combinations introduced in Table 1, along with the optimal result considering the best performance of the single operator versions (in the last column, named best). In Table 8, the results of the statistical tests show how the three versions of the algorithm achieve statistically different results only on a limited subset of the instances (at most 7 between MAENS*-IIa and MAENS*-IIc). The versions achieving the best results appear to be the ones adopting the concurrent strategy as an OSR (MAENS*-IIa and MAENS*-IIb). The two versions achieve extremely similar results (differing only in three instances), while the version using the instantaneous reward (MAENS*-IIc) differs from the other two respectively in seven and five instances, and loses the comparison in the majority of the cases. In contrast, MAENS*-IIc is the variant that differs the least w.r.t. the best results achieved by the single operator versions of the algorithm (only six statistically different instances, while the other two versions differ in eight ad nine instances).

Although such results show small differences between the performances of the algorithms when adopting one OSR rather than the other, it is possible to see how the concurrent strategy appears to perform slightly better. This might be explained by several factors. First, the use of more than one crossover operator might introduce higher diversity in the whole offspring population. Secondly, the capacity of monitoring and verifying the performance of all the crossover operators might be important to detect changes in the environment. With regards to the reward measure adopted, the two approaches achieved similar results. This could be interpreted by similar importance of the requirements that the two measures try to satisfy (diversity and survival ability of the offspring). The balance might be different when tackling larger CARP instances, as in the case of those in Group B, where the exploration ability of the operator might have a bigger impact on the performance of the algorithm.

The results achieved by MAENS*-IIa and MAENS*-IIb on Group B are included in Table 9. The two algorithms show a comparable result on nine instances out of ten, with the only statistically different result according to the Wilcoxon rank-sum test being that of the instance EGL-G2-C, with a p-value of 0.0004. The similarity of the results achieved by the two different versions of the algorithm in both datasets can be explained by the fact that the use of the FLA metrics makes the algorithm more robust with respect to the reward measure considered. Further experiments with more aggressive credit assignment strategies might reveal more differences between the adoption of the two different reward measures. Finally, we provide a comparison with a version the algorithm selecting one crossover operator randomly during each generation. The results of such algorithm are included in Table 7, in the rightmost column named random. At the bottom it is possible to see the number of statistically different instances according to the Wilcoxon rank-sum test with a level of significance of 0.05 (line \(\#\)) with respect to the three algorithms MAENS*-IIa, MAENS*-IIb and MAENS*-IIc, along with the number of times the random algorithm has won the comparison (line W). It is worth noting that the random algorithm achieves a fairly good performance, as it achieves statistically comparable results with the proposed techniques for most of the instances. This result could be interpreted as a probable sign of positive interaction between the crossover operators that have been considered in this case study.

Table 10 Comparison with some state-of-the-art approaches for CARP

7.3 Effectiveness of the FLA measures

An experiment was designed to understand whether the use of the online FLA techniques has a beneficial effect on both the optimization ability and the prediction capacity of the algorithm. Therefore, MAENS*-IIc was compared to MAENS*-rw, a version of the algorithm which only makes use of the Proportional Reward measure as an input feature of the learning algorithm, without considering the values provided by the FLA techniques. In this context, we are not interested in the results achieved by the algorithm but rather we want to verify that the results are significantly different or not and prove, as a consequence, a certain suitability of the rDWM algorithm to the presence of the FLA measures. The results of such algorithm are included in Table 7 in the column MAENS*-IIrw. A Wilcoxon ranked-sum test was performed against the results achieved by MAENS*-IIc. The two algorithms produced statistically different results on 36 instances out of 53. MAENS*-IIrw achieved better results only on three instances, losing the comparison on 33. A Wilcoxon signed-rank test was consequently applied across the problem instances, which confirmed that the two algorithms produce significantly different results (respectively \(W_{\mathrm{stat}}=26\) with \(p<0.05\) and \(W_{\mathrm{stat}}=54.5\) sample size: 42). This can be interpreted as a signal that the rDWM is concretely affected by the FLA measures, which influence (in a beneficial way) the decisions made by the algorithm.

7.4 Comparison with the state-of-the-art

The second research question in the introduction of this paper focuses on the performance of the proposed approach with respect to the existing ones. Therefore, the MAENS*-II variants that make use of the Proportional Reward (a and b) were tested against the oracle. All the three variants were also compared against the results achieved by their base algorithm MAENS*. The results achieved by the oracle and by the MAENS* algorithm for Group A are included in Table 7, in columns MAENS* and oracle. In the bottom rows, the results of a Wilcoxon rank-sum test with Holm–Bonferroni correction at the 0.05 significance level show the number of instances with statistically different results. The results of the statistical test show how the number of statistically different results is small (4 for MAENS*-IIa and MAENS*-IIc and 2 for MAENS*-IIb). In these few instances, MAENS*-IIa and MAENS*-b perform better than MAENS*, while MAENS*-II wins the comparison in half of the instances (2 out of 4). The online learning system is therefore able to achieve results comparable to those achieved by the bandit solver.

The comparison with the oracle shows that MAENS*-IIa and MAENS*-IIb are able to achieve comparable results in most cases. In most of the instances with statistically different results, the oracle was able to perform better. It is worth noting that in a small number of instances the algorithm using the FLA measures was able to produce better results than the oracle. This is some evidence that, if the oracle represents a “lower bound” for the results that is possible to achieve using the proportional reward, the use of more than one measures (as in this case) can help the algorithm to achieve results beyond these bounds.

Fig. 4
figure 4

The network relative to the D07 instance (a). The cost of serving each edge is proportional to the thickness of the line. Non required edges can be identified by dotted lines. In be the four routes that compose a solution for this problem instance, generated by the MAENS*-IIa algorithm. The edges served during each route have been highlighted in black

Finally, the results achieved by MAENS*-IIa and MAENS*-IIb, included in Table 9, are compared against four state-of-the-art algorithms, whose results are included in Table 10. We consider the results of MAENS* (Consoli and Yao 2014), of MAENS-RDG (Mei et al. 2014a) and VND (Mei et al. 2014b) and an algorithm combining iterate local search and variable neighbourhood descent (Martinelli et al. 2013).

It is possible to notice how MAENS*-IIa and MAENS*-IIb, as well as MAENS*, outperform all the other algorithms in terms of solution quality for the first five instances of Group B (Table 9). MAENS*-IIa, MAENS*-IIb and MAENS* produce a new best known solution for all of these instances, with MAENS*-IIa achieving the best ones on the first two instances (G1-A and G1-B), MAENS*-IIb on instances G1-C and G1-D and MAENS* finding the best one on the instance G1-E. In all these instances MAENS*-IIa and MAENS*-IIb achieve also the best average fitness in four cases. For the following five instances, the best results are achieved by either MAENS-RDG or VND. In all these cases, MAENS* is outperformed by both MAENS*-IIa and MAENS*-IIb. These results can be explained by the fact that their base algorithm, MAENS*, is already performing well for these instances. However, both variants MAENS*-IIa and MAENS*-IIb managed to outperform MAENS* in most of the instances. It is important to note that the runtime (not considered in this work) of these algorithms is not comparable to those of the decomposition-based approaches, which manage to find these results in a fraction of the time required by MAENS*-IIa and MAENS*-IIb.

Fig. 5
figure 5

Boxplots relative to the four instances EGL-G2-A (a), egl-s1-b (b), egl-s2-b (c), egl-e4-c (d). The boxes refer to the first quartile, median and third quartile. Whiskers show minimum and maximum values over a sample size of 30 best fitness values relative to the independent runs of each algorithm reported in Tables 5, 6, 7, 8, 9 and 10

The behaviour of the algorithms can be analyzed also in terms of the fitness distribution of its solutions. Figure 5 shows the box plot relative to three representative instances belonging to Group A (egl-e4-C, egl-s1-B, egl-s2-B) and one instance of Group B (EGL-G2-A). In the case of the EGL-G1-A instance, it is possible to notice how SPBX and GRX are the crossover operators whose usage leads to the distributions with the lowest median. The distribution of the three AOS considered in this case (MAENS*, MAENS*-IIa and MAENS*-IIb) are centered around the same median value, although MAENS*-IIa is capable of producing solutions of considerably better quality (bottom whisker) which translate into new minima for this instance. For the egl-s1-B instance (Fig. 5b), the behaviour of the algorithms is quite similar, as in most of the cases the distributions lie around the same median. When considering the results of the versions of the algorithm using each crossover operator, GSBX, GRX and SPBX show a much wider distribution of their results although in the first two cases a large number of solutions are equal to the median value, while PBX results are much less spread. The different AOS strategies achieve overall comparable results. This instance represents an example of non optimal behaviour as none of the AOS strategies considered has managed to match that of the best crossover operator (GRX).

For egl-s2-b (Fig. 5c), PBX is the operator that achieves the best results, while GRX performs the worst. MAENS*-IIb manages to achieve the same solution quality and similar median to PBX. This is also confirmed by the larger selection rate given to the PBX operator (Fig. 7b).

In the case of egl-e4-c (Fig. 5d), PBX and SPBX distributions have a similar median and similar quartiles performing the best among the four crossover operators. Among the AOS strategies, MAENS*-IIb solutions are distributed around a similar median but more spread.

7.5 Prediction ability

To understand the behaviours of the algorithms, and to gain a deeper understanding of the selection mechanisms, we provide a comparison of the selection rates of the four different crossover operators, included in Figs. 6, 7 and 8. The plots refer to the selection rates relative to the instances egl-s1-B, egl-s2-B and EGL-G2-A. The y-axis in the figure refers to the selection rate (SR) of each crossover operators, where a SR of 0 means that the operator is not selected and a SR equal to 1 means that only that operator is selected. The x-axis corresponds to the average fitness of the population discretised into 50 intervals. We study, therefore, how the SR of the four operator changes while the search is carried out and the average fitness of the population decreases.

Fig. 6
figure 6

Selection rates for the instance egl-s1-b. Each graph shows the selection rates of the four different crossover operators (GSBX, GRX, PBX, SPBX) when using respectively MAENS*-IIa (a), MAENS*-IIb (b), MAENS*-IIc (c), MAENS* (d) and the oracle (e)

Fig. 7
figure 7

Selection rates for the instance egl-s2-b. Each graph shows the selection rates of the four different crossover operators (GSBX, GRX, PBX, SPBX) when using respectively MAENS*-IIa (a), MAENS*-IIb (b), MAENS*-IIc (c), MAENS* (d) and the oracle (e)

Fig. 8
figure 8

Selection rates for the three instances EGL-G1-A. Each row shows the selection rates of the four different crossover operators (GSBX, GRX, PBX, SPBX) when using respectively MAENS*-IIa (a), MAENS*-IIb (b) and MAENS* (c). We do not include the results on EGL-G2-A for MAENS*-IIc as only versions MAENS*-IIa and MAENS*-IIb were tested on this dataset, as well as the oracle results, due to the extremely high computational cost required to perform this task

In the first instance, egl-s1-B (Fig. 6), it is possible to notice three phases in the oracle prediction (Fig. 6e). A first phase where the GRX operator is preferred over the others, an intermediate phase where the GRX and GSBX operators have nearly equal selection rates and a last phase characterized by a rise of the selection rate of the GRX operator which reaches 1 in the last moments of the search.

Both MAENS* (Fig. 6d) and MAENS*-IIc (Fig. 6c) award the GSBX operator with the highest selection rate for the whole search, missing the prediction of the change in the environment made by the oracle. It is possible to see, however, how MAENS*-IIc increases the selection rate of GSBX more rapidly than MAENS*.

The SR of both MAENS*-IIa (Fig. 6a) and MAENS*-IIb (Fig. 6b) show different changes during the search, proving that the CS is more successful in predicting such events. In particular, MAENS*-IIb acknowledges the operators GSBX and PBX as the most useful ones during the search. It is worth remembering that MAENS*-IIa, makes use of a different reward measure and, therefore, is not comparable to the prediction made by the oracle. In this case, MAENS*-IIa, after an initial epoch of dominance of the operator GRX, shows an alternance of moments where the three operators GRX, PBX and SPBX show the highest selection rates.

On the second instance (Fig. 7), the oracle identifies a change in the environment halfway through the search (Fig. 7e). The concept drift is not detected by either MAENS* (Fig. 7d) or MAENS*-IIc (Fig. 7c), which, however shows an higher exploitation of the GSBX operator. MAENS*-IIb (Fig. 7b) identifies the operators GSBX and PBX as the most successful ones; even in this case the change detected by the operator is not detected. As for the previous instance, MAENS*-IIa (Fig. 7a) shows different moments where the three operators GRX, PBX and SPBX achieve the highest SR. The lowest SR for GSBX seems to indicate that this operator is probably the one that introduces the least diversity in the population.

For the large CARP instance EGL-G2-A, the behaviour of MAENS* (Fig. 8c) shows a predominance of operator GSBX over the other ones. MAENS*-IIb (Fig. 8b) shows a similar behaviour to that of MAENS*, identifying the GSBX operator as the one with the best performance during almost the whole search. Finally, MAENS*-IIa shows again an initial period of higher performance for the GRX operator, followed by an alternance of the PBX and SPBX operators (Fig. 8a). The occurrence of this initial period of higher performance for the GRX operator seems to suggest that this operator is introducing the highest diversity in the initial part of the search, when the solutions are not extremely good (Fig. 8a).

The results of these experiments show that failing to detect a change in the environment does not necessarily translate into a worst performance of the algorithm and vice versa. This is confirmed by the fact that the algorithms produce good results despite the different selection rates. The relationship between the prediction ability of the algorithms and their results is, therefore, quite complex. There are several factors that influence its behaviour and that should be considered in order to fully grasp this mechanism, such as the interaction between the different operators, the performance of the single operators and the variation of the selection rates.

8 Conclusions and future work

In this work, we proposed the adoption of a novel adaptive operator selection scheme to identify the optimal crossover operator online. We consider the use of two different reward measure strategies, the diversity-based reward (DBR) and the proportional reward (PR), as well as two different operator selection rule, namely the instantaneous reward (IR) and a concurrent approach (CA). The AOS proposed combines a set of four fitness landscape analysis measures in conjunction with an online learning algorithm, to predict the most suitable crossover operator. We have chosen four FLA metrics to be used as inputs of our predictive model: accumulated escape probability, dispersion metric, average neutrality ratio and average delta fitness of neutral networks. These metrics have been chosen because (1) they can be computed without much increasing the computational effort and (2) they complement each other by capturing different features of the landscapes. Three versions of the MAENS* (Consoli and Yao 2014) algorithm were implemented and tested on two datasets of CARP instances. The results of such experiments were compared against those by state-of-the-art algorithms, and against an oracle. The results achieved by MAENS*-II show that this technique is able to compete with the state-of-the-art techniques and can, in some cases, exploit the multiple measures to outperform the state-of-the-art. In the dataset containing large CARP instances, MAENS*-II was able to outperform all the existing approaches in terms of average and best solution quality in half of the instances, and even discovered new lower bounds.

Our experiments seem to suggest a better performance of the concurrent strategy over the instantaneous reward, and a comparable performance of the two reward measure strategies.

This work leaves space for interesting directions that can be explored. First, the two reward measures might be combined to generate a novel measure that is able to predict better both the diversity and the survival ability of the offspring. Secondly, it would be interesting to test the behaviour of our algorithm when adopting an average or extreme reward strategy and the use of different base learners. Adaptive operator selection might be extended to different cases. In particular, fo MAENS, an AOS strategy can be adapted to choose among different parent selection strategies for the crossover operator, to analyze its impact on the offspring generation. Another direction is that of reducing the computational cost of MAENS*-II. Furthermore, due to the improved optimization ability provided by this approach, it would be interesting to test the use of MAENS*-II as the single objective routine for existing decomposition-based approaches. Finally, our technique might be adopted to improve the performance of evolutionary algorithms for other combinatorial optimization problems.