Journal of the Operational Research Society

, Volume 63, Issue 3, pp 392–405 | Cite as

An empirical study of hyperheuristics for managing very large sets of low level heuristics

  • S Remde
  • P Cowling
  • K Dahal
  • N Colledge
  • E Selensky
General Paper


Hyperheuristics give us the appealing possibility of abstracting the solution method from the problem, since our hyperheuristic, at each decision point, chooses between different low level heuristics rather than different solutions as is usually the case for metaheuristics. By assembling low level heuristics from parameterised components we may create hundreds or thousands of low level heuristics, and there is increasing evidence that this is effective in dealing with every eventuality that may arise when solving different combinatorial optimisation problem instances since at each iteration the solution landscape is amenable to at least one of the low level heuristics. However, the large number of low level heuristics means that the hyperheuristic has to intelligently select the correct low level heuristic to use, to make best use of available CPU time. This paper empirically investigates several hyperheuristics designed for large collections of low level heuristics and adapts other hyperheuristics from the literature to cope with these large sets of low level heuristics on a difficult real-world workforce scheduling problem. In the process we empirically investigate a wide range of approaches for setting tabu tenure in hyperheuristic methods, for a complex real-world problem. The results show that the hyperheuristic methods described provide a good way to trade off CPU time and solution quality.


computational analysis heuristics hyperheuristics machine learning optimisation scheduling tabu search 

1. Introduction

The term hyperheuristics (Chakhlevitch and Cowling, 2008; Burke et al, 2010) was coined in Cowling et al (2001) to denote a class of heuristics which searches a space of low level heuristics, whereas metaheuristics typically search directly in the solution space. The hyperheuristic uses information about the performance of each low level heuristic (CPU time and solution quality metrics) to determine which low level heuristic(s) to apply at each decision point. The hyperheuristic method does not need to be problem specific, and hence a single hyperheuristic method has the advantage that it can work generally across many problem models and instances, given the right set of low level heuristics and solution quality metrics. There is good evidence to date that hyperheuristics are effective across a range of problems, and this effectiveness arguably arises since having a large collection of low level heuristics means that the solution landscape for one or more of these low level heuristics is likely to provide a good search direction (Chakhlevitch and Cowling, 2005; Colledge, 2009; Remde et al, 2009).

In some cases low level heuristics are parameterised, or composed by ‘multiplying’ together components (Chakhlevitch and Cowling, 2005; Remde et al, 2007), which can give rise to hundreds or even thousands of heuristics (Cowling and Chakhlevitch, 2003). In such a case, deciding in reasonable time which heuristics to use may be difficult. However, there is evidence that having such a rich selection of low level heuristics may yield better results for complex problems in the long run, although it is difficult to know in advance which low level heuristics will prove effective (Chakhlevitch and Cowling, 2005). Remde et al (2007) studied the low level heuristics used by a greedy hyperheuristic HyperGreedy (which picks the best performing low level heuristic at each iteration) and found that a quarter of the low level heuristics were never used and half of the low level heuristics were effective less than 1% of the time. However, the low level heuristics which were effective varied from problem instance to problem instance, and it was difficult to predict which low level heuristics would prove effective. In this paper we investigate hyperheuristic approaches that attempt to learn which low level heuristics will perform poorly and ignore them to produce solutions of a quality similar to those produced using the full set of low level heuristics, in a fraction of the CPU time. This is particularly of interest since hyperheuristic methods have demonstrated their effectiveness at solving problems such as automated planograms (Bai and Kendall, 2005), examination scheduling (Burke et al, 2003), personnel scheduling (Cowling and Chakhlevitch, 2003), workforce scheduling (Remde et al, 2007) and artificial intelligence in computer games (Nareyek, 2004). These applications have shown that hyperheuristics offer some of the solution quality we would associate with tailored methods, but that they are very flexible in dealing with different problem instances (and indeed different problems) and that they remain effective when the problem is changed in significant ways, without requiring substantial intervention from a human expert (Kendall and Hussin, 2005a).

In this paper we investigate a workforce scheduling problem, which we have studied for several years now in collaboration with Trimble MRM Ltd. It contains as sub-problems the Resource Constrained Project Scheduling Problem (Kolisch and Hartmann, 2006), Job Shop Scheduling Problem (Pinedo and Chao, 1999) and Vehicle Routing Problem with Time Windows (Toth and Vigo, 2001). In this real-world situation we have several further constraints and objectives, and in particular we consider the notion of the degree of competence of a particular group of resources (eg, engineers) which have been allocated to a particular task. Trimble MRM Ltd. develops scheduling solutions for very large, complex mobile workforce scheduling problems in a variety of industries, particularly telecommunications and utilities. The model which we investigate here encapsulates the main features common to these problems.

We use the problem and the hyperheuristic framework of Remde et al (2007) as a test bed for our investigation in this paper. The complexity of this problem means that quality evaluation of a perturbed solution takes a significant amount of CPU time, whereas limited amounts of CPU time are available to find good solutions in practice. Many of the low level heuristics in our hyperheuristic framework require the evaluation of hundreds of these perturbed solutions, due to them solving smaller parts of the problem optimally using systematic search or investigating a large neighbourhood. The available CPU time renders local search-based hyperheuristics infeasible in our practical application, so that we will only consider constructive hyperheuristics in this paper. The number of low level heuristics available to the hyperheuristics is large compared to the number of those low level heuristics we will apply in constructing a solution, and new approaches are needed to learn the effectiveness of low level heuristics in this case. Chakhlevitch and Cowling (2005) find that when selecting a small random subset of low level heuristics results can be erratic across different problem instances indicating that a reduced subset is not as good as a large set.

Our approaches for deciding which low level heuristic to apply can be considered as tabu search-based hyperheuristics where we learn appropriate tabu tenures. Binary Exponential Back Off (BEBO) (Remde et al, 2009) is a tabu search-based hyperheuristic with dynamically adapting tabu tenures designed for very large neighbourhoods. It is based on an analogy with computer networks, where binary exponential back-off or truncated binary exponential back-off is a randomised protocol for regulating transmission on a multiple access broadcast channel (Kwak et al, 2005). In this paper we carry out a thorough investigation of this approach alongside other approaches to setting tabu tenure in hyperheuristics from the literature. Nareyek (2004) and Cowling et al (2001) use reinforcement learning to estimate the future performance of a low level heuristic. When a move performs well it is positively reinforced. When it performs poorly it is negatively reinforced. We investigate this reinforcement mechanism for a large collection of low level heuristics, as well as the step-by-step reduction (SSR) methods of Chakhlevitch and Cowling (2005).

This paper is structured as follows: we present related work in Section 2. Our problem is presented in Section 3. Section 4 describes the hyperheuristic framework and hyperheuristics in detail. In Section 5 we empirically investigate the techniques and compare them to Variable Neighbourhood Search, Greedy, Random and tabu search-based heuristics in terms of solution quality and computational time. We present conclusions in Section 6.

2. Related work

Many hyperheuristics are based on metaheuristic methods, including early work in Fang et al (1994) where a genetic algorithm evolved a chromosome which determined how jobs were scheduled in open shop scheduling. In Bai and Kendall (2005) simulated annealing is used to decide whether to accept the solution resulting from a randomly applied low level heuristic. Kendall et al (2002) use a Genetic Algorithm to evolve good sequences of low level heuristics. Chakhlevitch and Cowling (2005) use a learning approach called SSR and Warming Up to reduce the number of low level heuristics and show that SSR produced better results. SSR removes a percentage of the low level heuristics periodically to try and reduce the set to an elite few. This removes bad heuristics early on and saves CPU time that would have been used trying them, but suffers from the potential limitation that it does not allow the reintroduction of these heuristics later in the search. The work of Remde et al (2007) shows that a low level heuristic's effectiveness is highly variable during search with some low level heuristics which are ineffective at the start of search proving highly effective at the end.

Tabu Search (Glover and Laguna, 1997) is used to stop the repeated application of poor moves or the undoing of good moves for a certain number of iterations (the tabu tenure). The optimal duration of tabu tenure has been tested in several papers and it is most likely a function of the neighbourhood size and the problem size (Laguna et al, 1999). Using random tabu tenures, where a move is made tabu for a period chosen uniformly at randomly between 1 and the maximum tabu tenure, tends to work better than fixed tabu tenures (Rolland et al, 1996).

Several papers have investigated tabu search-based hyperheuristics. A tabu mechanism is used in Kendall and Hussin (2005a) where poorly performing low level heuristics are made tabu for a fixed tabu tenure. A small number of low level heuristics (13) are used with short tabu tenures (1–4 iterations) and good results are obtained in a large amount of CPU time. This is also considered in Kendall and Hussin (2005b) where the low level heuristic is repeated until no further improvements can be found before being made tabu and random tabu tenures are utilised. Random tabu tenures provide results of similar quality to those with fixed tenure equal to the expected random tenure on the two examination timetabling problem instances they consider. They find that repeated application of a low level heuristic does not increase solution quality considerably, possibly due to an increased tendency to get stuck in large basins of attraction. A large set (95) of low level heuristics is used in Cowling and Chakhlevitch (2003) where the hyperheuristic allows a tabu low level heuristic to become aspirated and be used (Glover and Laguna, 1997) if it makes the best improvement. If no improving low level heuristic is available, a non-improving non-tabu low level heuristic is used and made tabu. Fixed tabu tenures of 10, 30, 60 and 100 and adaptive tabu tenures are investigated, but results provide no clear advantage of using adaptive tabu tenures over fixed ones. A ranking system, based on reinforcement learning, is used for non-tabu low level heuristics in Burke et al (2003). At each iteration the non-tabu low level heuristic with the highest rank is applied. When a non-tabu low level heuristic performs well its rank is increased, otherwise its rank is decreased and the low level heuristic is put in the tabu list on a first-in-first-out basis. If the highest ranked low level heuristic makes the solution worse, the tabu list is emptied.

Nareyek (2004) experimentally investigates several variations of reinforcement learning in a hyperheuristic framework. Different reinforcement schemes are used in two different problems and it is concluded that high rates of negative reinforcement and low rates of positive reinforcement work best. The choice function (Cowling et al, 2001) is another machine learning hyperheuristic that attempts to estimate how well a low level heuristic is likely to perform based on its effect on the (single) objective function, the pair-wise interaction between low level heuristics and the time since it was last used. These two papers combine ideas from machine learning and tabu search.

The hyperheuristic framework of Remde et al (2007) proposes a method to break down the Trimble MRM workforce scheduling problem, by splitting it into smaller parts and solving each part using exact enumerative approaches. These smaller parts are the combination of a method to select a task and a method to select potential resources, including time, for the task. Reduced Variable Neighbourhood Search (rVNS) (Mladenović and Hansen, 1997) and simple hyperheuristics are shown to be effective in deciding the order in which to solve sub-problems.

As can be seen above, there has been a wealth of interesting work in the area of hyperheuristics, although there have only been limited comparative studies to date. One of the principal contributions of this paper is the first thorough empirical investigations of tabu/reinforcement learning/ranking methods alongside binary exponential back-off and stepwise reduction methods for a difficult real-world problem.

3. Problem description

The workforce scheduling problem that we consider consists of four main components: Tasks, Resources, Skills and Locations. A task Ti is a job or part of a job. Each task must start and end at a specified location. Usually the start and end locations are the same, but they may be different. Each task has one or more time windows and some time windows have an associated penalty. We have a set {T1,T2, …, Tn} of tasks to be completed. Each task is executed by one or more resources {R1,R2, …, Rm}. A task requires resources with the appropriate skills from the set {S1,S2, …, Sk}. Task Ti requires skills [TS1i,TS2i, …, TSt(i)i] with work requirements [w1i,w2i, …, wt(i)i] where wqi is the amount of skill TSqi required for task Ti. Task Ti also has an associated priority p(Ti). Resource Rj possesses skills [RS1j,RS2j, …, RSr(j)j]. A function c(R,S) expresses the competence of resource R at skill S, relative to an average competency. Each resource R travels from location to location at speed v(R). For tasks T1, T2, d(T1, T2) is the distance between the end location of T1 and the start location of T2.

There are three main groups of constraints: task constraints, resource constraints and location constraints.

3.1. Task constraints

  • Each task can be worked on only within specified time windows.

  • Some tasks require other tasks to have been completed before they can begin (precedence constraints).

  • Some tasks require other tasks to be started at the same time (assist constraints).

  • Tasks may be split across breaks within a working day.

  • For a task to be scheduled it must have exactly one resource assigned to it for each of the skills it requires.

  • All assigned resources have to be available at the task's location for its whole duration regardless of their skill competency and task skill work requirement.

  • If a task Ti with skill requirements [TS1i,TS2i, …, TSt(i)i] and amounts [w1i,w2i, …, wt(i)i] is carried out by resources [R1i,R2i, …, Rt(i)i] then the time taken is (ie, the greatest time taken for any single resource to complete a skill requirement).

3.2. Resource constraints

  • A resource R travels from location to location at a fixed speed v(R).

  • Resources may only work and travel during specified time windows.

  • Resources can only work on one task at a time and only apply one skill at a time.

3.3. Location constraints

  • Resources (generally engineers in vans or large pieces of equipment) must travel to the start location of each task they work on, and are unavailable during this travel time.

  • Resources must start and end each day at a specified ‘home’ location and must have sufficient time to travel to and from their home location at the start and end of each day.

When building a schedule many different and often contradictory business objectives are possible. In this paper we consider three objectives. The first objective is Schedule Priority (SP), given by

Maximising Schedule Priority maximises the value of the tasks scheduled and implicitly minimises the value of tasks unscheduled.

The second objective measures total Travel Time (TT) across all resources. Define A={(i1, i2, j): task Ti1 comes immediately before Ti2 in the schedule of resource Rj}.

Travel to and from home locations is handled by considering dummy tasks fixed at the start and end of the working day, at the home location of each resource.

The third objective measures the inconvenience associated with completing tasks or using resources at an inconvenient time, which we have labelled Schedule Cost (SC). In order to express this accurately we express the time windows for Resource R using a function X (Baptiste et al, 2001) where X(R,t) is the cost per unit time for resource R working at time t. We introduce a variable
Similarly we introduce X where X(T,t) is the cost per unit time for task T being executed at time t and

In this paper, the fitness of a schedule is given by a single weighted objective function, f=SP−4SC−2TT, where SP is the sum of the priority of scheduled tasks, SC is the sum of the resource and task time window costs in the schedule and TT is the total amount of travel time. This objective is to maximise the total priority of tasks scheduled while minimising travel time and cost. Values are scaled so that this expression is a realistic representation of solution quality for a large class of problems encountered in practice (Cowling et al, 2006).

4. Hyperheuristic approaches

Previous hyperheuristic work on this workforce scheduling problem (Remde et al, 2007, 2009) generated possible low level heuristics by combining two components: (1) selecting the next task to be scheduled and (2) allocating potential resources (including time) for that task. The task selector (Table 1) chooses a task and the resource allocator (Table 2) assigns resources for each skill required by the task, so that the total number of low level heuristics is the number of task selectors multiplied by the number of resource allocators (Figure 1 shows how the low level heuristics work). Note that where a resource allocator is parameterised, we consider each parameter set as an individual resource allocation heuristic. We combine each of the nine different task selectors with each of the 27 different resource allocators to give a total of 243 low level heuristics. Our low level heuristics maintain a feasible solution—if the low level heuristic cannot make a legal move then the solution is not modified.
Table 1

Task selectors




Tasks are ordered at random


Tasks are ordered by their priority in descending order


Tasks are ordered by their priority multiplied by the number of resources required in descending order


Tasks are ordered by their priority in ascending order


Tasks are ordered by their number of precedences ascending


Tasks are ordered by their number of precedences descending


Tasks are ordered by their estimated priority per hour assuming the task will take as long as the total skill requirement


Tasks are ordered by their estimated priority per hour assuming the task will take as long as the maximum skill requirement


Tasks are ordered by their estimated priority per hour assuming the task will take as long as the average skill requirement

Table 2

Resource allocators



Best x-y

Orders the available resources by their competency at the task then chooses the resources ranked from x to y in the list (xy values considered are: 1–5, 6–10, 11–15, 16–20, 21–25, 26–30, 31–35, 36–40, 1–10, 11–20, 21–30, 31–40, 1–2, 3–4, 5–6, … etc, 2–3, 3–4, 4–5, …etc, 1–4, 3–6, 5–8, 7–10, … etc, 5–14, 15–24, 25–34.

Deviation x

Resources complete a skill in a time dependent upon their competence. This selector attempts to find resources that will complete the different skills of task in the same amount of time by selecting resources with competencies that deviate x∈{50%, 25%, 12.5%, 6.25%} from the task's skill requirement.

xth Quarter

This picks the x∈{1,2,3,4} quarter of task ranked by skill. Unlike the ‘Top x’ task selectors, the number chosen is proportionate to the number of resources who can do the task.

xth Eighth

This picks the x∈{1…8} eighth of task ranked by skill.

Dynamic x

This selector picks larger sets of resources for the skills requiring more effort and less to those requiring less effort. It will create x∈{10, 50, 100, 1000} combinations when enumerating the resulting sets.

All resources

Considers all possible resources (and hence is very slow).

Each parameter set yields a separate resource allocation heuristic (eg, Best 1–5; Deviation 25%).

Figure 1

Resource allocators. The dotted subset of resources possessing the required skill is chosen by a resource allocator. The assignment (R2, R1) is chosen as the best insertion.

The hyperheuristic HyperRandom (Remde et al, 2007) selects at random a low level heuristic (ie, a (task order, resource allocator) pair) to use at each iteration and applies it if the application will result in an improvement. This continues until no improvement has been found for a certain number of iterations. HyperGreedy (Remde et al, 2007) evaluates all the low level heuristics at each iteration and applies the best if it makes an improvement. This continues until no improvement is found. As might be expected, HyperGreedy is very CPU-intensive, but generates good quality results.

As the low level heuristics are constructive, we can only apply each one a small number of times before the schedule is full, since approximately 250 tasks can fit in the schedule of the problems we study compared to 243 low level heuristics. Hyperheuristics which rely on learning from the application of a low level heuristic during a single long search run would be ineffective, as the schedule would be full before significant information has been learnt. Note that it is coincidental that the number of low level heuristics and the number of tasks is similar in this case. Reinforcement learning-based hyperheuristics need to be modified in this case to learn from all different low level heuristics tried in a situation and not simply the one that was applied in the end, otherwise the vast majority of information gained about low level heuristics would be discarded. This paper investigates and empirically compares hyperheuristic performance in this situation.

Sections 4.1, 4.2 and 4.3 describe the hyperheuristics which attempt to learn information about the low level heuristics and their potential to improve the solution.

4.1. Binary Exponential Back Off

BEBO is a tabu search-based hyperheuristic with dynamically adapting tabu tenures designed for very large neighbourhoods, inspired by the binary exponential back-off algorithm used as the industry standard to transmit packets in a network (Kwak et al, 2005). It aims to increase network throughput by exponentially increasing the time between retransmits when a collision occurs. This happens when two or more computers try to transmit information on the same medium (a wire, a wireless frequency, etc) at the same time. When a collision occurs, the device will increase its backoff value by 1, and wait a random amount of time between 0 and 2backoff-1 before trying to retransmit. If the transmission is successful, the back-off value is reset to 0, otherwise the back-off value is increased by 1 again and the process is repeated. Hence waiting time increases exponentially for low level heuristics in this case.

We use an analogous backing off method to exponentially increase the tabu tenure of low level heuristics which repeatedly yield no improvements, meaning the expected time between trials of bad heuristics increases exponentially. Pseudocode for the hyperheuristic is given in Figure 2. We use two methods to decide which of the low level heuristics, that were tried in the current iteration, to back off (those ‘deemed bad’): BEBO Best x: only the best x improving low level heuristics are not backed off, and BEBO Prop x: all non-improving low level heuristics and those improving low level heuristics not in the top x% of the range of the fitness are backed off.
Figure 2

The Binary Exponential Back Off (BEBO) hyperheuristic.

backoff_min is set to 4 as it provided good results in Remde et al (2009). Although no empirical investigation of different values exists, increasing this value decreases CPU time and fitness, and decreasing it increases CPU time and fitness. Incidentally, when this value is 0, the hyperheuristic is equivalent to HyperGreedy as low level heuristics will not be made tabu.

4.2. Reinforcement learning

Reinforcement learning is a machine learning technique that positively reinforces good choices and negatively reinforce bad choices (Kaelbling et al, 1996). Nareyek (2004) proposes a reinforcement learning hyperheuristic framework to investigate several approaches to learning empirically. A utility value is associated with each choice which estimates its future potential. At each iteration a choice is made based on these utility values and then the utility value is adjusted depending on the outcome of that choice and the learning mechanism used. Five methods for positively reinforcing the utility and five methods for negatively reinforcing the utility are experimentally evaluated and in general low rates of positive reinforcement and high rates of negative reinforcement gave best results. In our experiments, we will use the adaptation scheme which was found to be best in Nareyek (2004). For positive reinforcement ui:=ui+1 and ui:=√ui for negative reinforcement of low level heuristic i.

Nareyek's experiments ran for 10 000 iterations and the number of low level heuristics was very small (5 and 6 in the two different problems). This means that each heuristic could be applied many times giving the hyperheuristic time to learn. In the problem we study this ratio is drastically reduced and since our low level heuristics (choices) nearly always make a positive change in terms of fitness, this means that modification of the hyperheuristic will be needed.

At each iteration, a number of the low level heuristics with the highest utility will be tried instead of just one. The best performing of the selected low level heuristics will be positively reinforced and applied and the rest negatively reinforced. The percentage we try will give us the ability to trade off CPU time and solution quality. The pseudocode for the reinforcement learning hyperheuristic is given in Figure 3. In our computational experiments, the heuristic Nareyek x% signifies low level heuristics whose utility is in the top x% will be tried.
Figure 3

Reinforcement learning-based hyperheuristic.

4.3. Tabu search-based hyperheuristics

The tabu search-based hyperheuristics from the literature cannot be straightforwardly applied, since the number of low level heuristics is large here and the number of iterations is small. Our tabu search-based hyperheuristics try all non-tabu low level heuristics at every iteration. The top x at every iteration will not be made tabu. This leaves a good number of good low level heuristics available for the next iteration and can be adjusted to trade off CPU time for solution quality. Hence we do not need to reset tabu tenures periodically as in Burke et al (2003). We give the pseudocode in Figure 4.
Figure 4

Tabu search-based hyperheuristic.

We try tabu tenures of t=5, 7, 10, 25, 50 each time a low level heuristic is tried and fails to give an improvement. We also investigate different methods of deciding which low level heuristics to make tabu. Tabu Best x t=y signifies that all but the top x improving low level heuristics will be made tabu with tenure y at each iteration. This is similar to the method used in Burke et al (2003), with larger tabu tenures since there are more low level heuristics in our case. We also investigated making all non-improving low level heuristics tabu; however, the results for these were very poor in terms of CPU time as nearly all of the low level heuristics make a positive improvement early in the search although this improvement is very small. These results are not reported here. In addition to these fixed tenures, we try random tenures as used in Kendall and Hussin (2005a): rTabu Best x t=y is similar to Tabu Best x t=y, but with a random tenure between 0 and y each time a low level heuristic is made tabu.

4.4. Other methods

We compare these adapted methods to existing ones designed for large neighbourhoods and problem-specific heuristics.

rVNS is the best reduced Variable Neighbourhood Search method taken from Remde et al (2007) and is a hand crafted tailored heuristic for this problem. Hyper Random and HyperGreedy are the random and greedy hyperheuristics from Remde et al (2009). HyperGreedy will be the benchmark for all the tests as this is the most CPU-intensive approach and generally produces the best result. Sample x% is variation of HyperRandom. In this Hyperheuristic, x% of the low level heuristics are sampled uniformly at random and the best improving one is applied.

SSR is presented in Chakhlevitch and Cowling (2005). The method SSR x% t=y reduces the set of low level heuristics by x% every y iterations. The ones which have performed the worst over the period, as measured by the objective value, are removed first, ties are broken randomly.

5. Computational experiments

Each of the hyperheuristics was used 10 times to solve five problem instances. In each run the hyperheuristic was given one attempt to construct a schedule, and the CPU time taken and the solution fitness were recorded. Note that each hyperheuristic has its own stopping criteria as given in the pseudocode. The instances require the scheduling of 400 tasks using 100 resources over one day using five different skills. Tasks require between one and three skills and resources possess between one and five skills. The problems reflect realistic problems Trimble have identified and are generated using a commercial problem instance generator (Cowling et al, 2006). The size of the problems was chosen to be solvable in a reasonable amount of CPU time. Over 218 CPU days was needed to complete all the experiments for the problem instances, so the experiments were run in parallel on 88 cores of 22 identical 4 core 2.0 GHz machines. Implementation was done using C# .NET under Microsoft Windows Vista.

We compare the average performance in terms of CPU time and solution quality of the 62 methods based on nine different hyperheuristic approaches with different parameters (rVNS, HyperGreedy, HyperRandom, 9 BEBO ‘Best x’, 7 BEBO ‘Prop x’, 15 ‘standard’ Tabu, 7 Nareyek based, 16 Step-by-Step Reduction and 7 Random Sampling hyperheuristics). Table 3 summarises each of these categories of heuristics. The experimental results, average fitness and CPU time along with its parameter for each of these hyperheuritics are presented in Table 4. Note that only the best results in terms of CPU time and fitness are shown for BEBO, rTabu and Tabu due to the limited space. The complete results for these three approaches are given in Remde et al (2009). Figure 5 depicts some selected results for comparing the hyperheuristics performances. The table and graph show the results of the different solution methods with various parameters. The line connecting the points in Figure 5 is for the Sample heuristic which is a naïve approach to trading off CPU time and solution quality. Hyperheuristics which are making better-than-random choices of low level heuristics are above the line connecting the Sample points in Figure 5. For the Sample hyperheuristic, we can see that it can effectively use additional CPU time to generate higher quality solutions, and indeed that it provides a continuum of improving results with increased CPU time. Sample 80% produces better results than HyperGreedy in less CPU time, probably due to its avoiding consistently applying the same low level heuristics at each iteration providing a useful and effective source of diversification.
Table 3

Summary of hyperheuristics




The best of a selection of fast handcrafted reduced Variable Neighbourhood Search based heuristic from a large experimental study (Remde et al, 2007).


A Greedy Hyperheuristic that samples all low level heuristics at each iteration and applies the best one.


A Binary Exponential Back-Off-based hyperheuristic backing off all but a fixed number of the best performing low level heuristics.


A Binary Exponential Back-Off-based hyperheuristic backing off all but the low level heuristics performing within a percentage of the best performing low level heuristic.


A Machine learning hyperheuristic based on (Nareyek, 2004) reinforcement learning experiments which tries a fixed number of low level heuristics with the highest utility.


A random hyperheuristic which tries a percentage of the low level heuristics at each iteration and applies the best one.


The Step-by-Step Reduction hyperheuristic of (Chakhlevitch and Cowling, 2005) which removes a percentage of the poorest performing low level heuristics every few iterations.


A Tabu search-based hyperheuristic that selects the best non-tabu low level heuristic and makes a number of non-tabu poor performing low level heuristic tabu for a fixed number of iterations.


As Tabu, with random tabu tenures.

Table 4

Average fitness and CPU time of each hyperheuristic and parameters


Average fitness

Average time (s)

Fitness % of HyperGreedy

Time % of HyperGreedy


21 974.9







24 911.3


7 807.2




BEBO Best 1

24 324.6


1 446.1




BEBO Best 20

24 993.8


3 150.9




BEBO Prop 0.01%

24 756.3


2 341.1




BEBO Prop 0.05%

24 737.2


2 260.3




rTabu Best 5 t=50

22 872.2


2 271.5




rTabu Best 10 t=7

24 459


4 834.1




Tabu Best 5 t=50

19 139.1


1 784.8




Tabu Best 10 t=25

20 143.4


2 534.1




Nareyek 1%

18 308.6






Nareyek 5%

21 753.9






Nareyek 10%

22 106






Nareyek 20%

24 077.5


1 436.4




Nareyek 40%

24 298.4


1 652.5




Nareyek 60%

24 682


2 899.5




Nareyek 80%

24 899.4


4 986.7




Sample 1%







Sample 5%

20 796.4






Sample 10%

22 422.4






Sample 20%

23 641.1


1 211.2




Sample 40%

24 508.9


2 598.5




Sample 60%

24 730.8


4 034.3




Sample 80%

24 976.3


5 708.9




SSR 5% t=1

20 061


1 949.9




SSR 5% t=5

22 090.3


3 997.8




SSR 5% t=10

23 860.4


5 663.7




SSR 5% t=20

24 595.6


6 099.4




SSR 10% t=1

19 952.1


1 585.5




SSR 10% t=5

19 892.3


2 678.9




SSR 10% t=10

22 454.8


4 178.9




SSR 10% t=20

23 968.5


4 847.3




SSR 20% t=1

18 749.5


1 231.9




SSR 20% t=5

17 491


1 991.8




SSR 20% t=10

18 894.4


2 824.4




SSR 20% t=20

21 389.3


3 786.8




SSR 50% t=1

18 039.5






SSR 50% t=5

1 5787.8


1 173.1




SSR 50% t=10

16 460.3


1 708.1




SSR 50% t=20

19 006.9


2 674.3




Only the best results for BEBO, rTabu, Tabu in terms of CPU time and fitness are shown. Numbers in bold are the best results in terms of CPU time and fitness for each type of hyperheuristic. Full results for these hyperheuristics can be found in Remde et al (2009). Results show 95% confidence intervals.

Figure 5

Comparison of hyperheuristics which yield solutions having greater than 15 000 fitness on average, with respect to CPU time and average solution quality. Each plotted point represents a parameter setting of the corresponding hyperheuristic.

From Figure 5 and Table 4 we can see that when only a small amount of CPU time is available, a carefully tailored heuristic (the rVNS approach) is clearly superior to hyperheuristic approaches which have to adapt during search, using very little problem-specific information. Indeed the first of our hyperheuristics which achieves a better fitness is Nareyek 10% which uses 18 times as much as CPU time as rVNS. When more CPU time is available we can see that significantly better results are achievable. For a problem of this type, an improvement of 0.25% in fitness suggests the completion of an additional task (or an equivalent saving in terms of travel time and schedule cost) which is highly significant in practice. The CPU-intensive HyperGreedy approach generates a very high quality solution, on average, using large amounts of CPU time, although it appears to be somewhat wasteful of this CPU time compared to other approaches. Sample 80% generates better solutions on average in slightly less CPU time; the BEBO Best 20 hyperheuristic generates better results in less than half the CPU time of HyperGreedy. Other hyperheuristics approach the solution quality of HyperGreedy in a fraction of the CPU time, notably BEBO Prop 0.01%, rTabu Best 10 t=7, Nareyek 80% and SSR 5% t=20, suggesting that all these approaches contain interesting ingredients. Still HyperGreedy remains a good benchmark against which to judge other approaches.

SSR performs poorly in comparison to the other hyperheuristics. Remde et al (2007) provide evidence that some low level heuristics only begin to work well towards the end of a search and it is likely that SSR is discarding these due to low performance at the start of the search and then failing to find a good solution because these low level heuristics are needed toward the end of the search. It is notable that SSR approaches appear to have a near-linear improvement following a poor start, and that SSR approaches which discard only a small number of low level heuristics per iteration perform well, for modest CPU time savings compared to HyperGreedy.

The fixed tabu tenure hyperheuristics (Tabu) perform poorly in comparison to the random tabu tenure hyperheuristics (rTabu), supporting the conclusions of Rolland et al (1996), Kendall and Hussin (2005a). The tabu results are all well below the threshold given by the Sample points, indicating that these approaches perform significantly worse than random choice per second of CPU time. The rTabu results are only slightly below the Sample threshold line, and they do give modest improvements with increasing CPU time, but are not competitive with Sample, Nareyek and BEBO approaches.

The performance of Nareyek approaches is consistently better than that of the Sample approach with respect to CPU time and fitness. Nareyek approaches are capable of beating Sample using similar, small amounts of CPU time. The reinforcement learning technique used is shown to be effective, and certainly better than a random approach, in selecting good low level heuristics. The CPU times of the Sample and Nareyek hyperheuristics that consider a similar number of low level heuristics (Sample 1% and Nareyek 1%, Sample 10% and Nareyek 10% etc) are quite different due to the fact that some of the better low level heuristics take more CPU time and the Nareyek hyperheuristic identifies these and uses them more frequently than Sample. Hence Nareyek approaches offer some control of CPU time and are the best among those studied at low-to-moderate CPU times.

The BEBO hyperheuristics cover a smaller width of the CPU time scale, but consistently offer the best solution quality, for a given CPU time, when moderate to large amounts of CPU time are available. Our implementation of BEBO cannot produce results in smaller amounts of CPU time. Increasing min_backoff could make this hyperheuristic faster, although our experiments show a decline in solution quality in this case. BEBO Prop is slightly less effective than BEBO Best for slightly more CPU time, which agrees with the conclusions of Remde et al (2009).

Table 5 and Figure 6 show the Pareto frontier of these results which contains the heuristics giving optimal trade-offs between CPU time and solution quality. As mentioned earlier, the dominance of the tailored rVNS heuristic for small CPU time is apparent, then for low to moderate CPU time Nareyek and Sample dominate. The appearance of the random Sample approach in the Pareto frontier is unexpected, but an inspection of Figure 5 shows that there are few approaches consuming CPU time in this range. The absence of a Nareyek hyperheuristic in this range is due to the difficulty of tuning the Nareyek parameters to obtain a precise CPU time when the hyperheuristic is choosing between low level heuristics which take very different amounts of CPU time. This difficulty of tuning the CPU time applies also to BEBO and Tabu approaches. BEBO dominates the other methods when more CPU time is available; indeed the BEBO Best 20 approach dominates approaches which consume more than twice as much CPU time (which are shown in Figure 5 but not in Figure 6).
Table 5

Pareto optimal methods with respect to average fitness and CPU time


Average fitness

Average time (s)


21 974.9


Nareyek 1%


18 308.6

Nareyek 5%


21 753.9

Nareyek 10%


22 106.0

Sample 10%


22 422.5

Sample 20%

1 211.2

23 641.1

Nareyek 20%

1 436.4

24 077.5

BEBO Best 1

1 446.1

24 324.6

BEBO Best 2

1 775.4

24 588.6

BEBO Best 3

2 043.6

24 774.3

BEBO Best 10

2 572.5

24 782.8

BEBO Best 15

2 825.4

24 869.2

BEBO Best 20

3 150.9

24 993.8

Figure 6

Pareto optimal set of heuristics showing non-dominated solutions with respect to average CPU time and solution quality.

The relative effectiveness of each hyperheuristic was investigated using analysis of variance (ANOVA) and Tukey comparisons (Miller, 1997), and results are shown in Table 6. The statistic considered for each hyperheuristic, labelled ‘fitness’ in the table, is the absolute deviation of the best solution produced in a hyperheuristic run, from the best known solution. Using the problem instance as an explanatory variable produced a coefficient of determination of 0.004, indicating that only 0.4% of the variability can be explained by the problem instances, with 95% confidence, so that we are justified in considering different problem instances together. Performing ANOVA with the solution method as the explanatory variable showed that 74.5% of the variance can be explained by the solution method used with 95% confidence. Table 6 shows the summary of the Tukey pairwise comparisons, using 95% confidence intervals. This table shows us that a variety of methods perform well, if we consider only the fitness of the best solution produced by each method. The methods in group A (as shown in Table 6) cannot be separated with 95% confidence, based upon fitness alone. When we also consider solution quality per unit CPU time, the benefits of a more focused search in approaches such as BEBO Best 20, BEBO Prop 0.01%, Sample 40% and Nareyek 60% becomes clear. For this large, real-world problem, CPU-efficient approaches are important, and since the data in the table already represent over six months of CPU time, it would be very difficult in this case to find a smaller top-ranking group, with 95% confidence. Overall, it appears that BEBO, Sample and Nareyek methods outperform SSR and Tabu hyperheuristics in terms of fitness alone, and even more so if we also consider CPU time.
Table 6

Tukey analysis of hyperheuristic methods using a 95% confidence interval

6. Conclusions

This paper investigates and compares several hyperheuristic approaches which handle a large number of low level heuristics, on a difficult real-world scheduling problem. Our low level heuristics are generated by combining parameterised components for (i) choosing which task to schedule; and (ii) allocating resources to the chosen task, giving us a large set of over 200 low level heuristics. Since a large number of low level heuristics are available, our intuition suggests that at each step the solution landscape is amenable to at least one of the low level heuristics. Our results, in comparison to a tailored reduced Variable Neighbourhood Search approach, and an approach which greedily searches through all low level heuristics at each step, suggest that this is an effective way to produce high quality solutions, albeit in large amounts of CPU time.

The problem we consider contains features common to many real-world scheduling problems which deal with mobile resource management (occurring in pick up/delivery, project management, routing and maintenance applications). The approach to generating low level heuristics can generalise across these problems, so that our hyperheuristic approaches and results give an indication of the effectiveness of these techniques for a wide range of complex mobile workforce scheduling problems.

When using a large collection of low level heuristics, we must decide between different low level heuristics at each iteration rather than choosing directly between different solutions. Several methods are presented for choosing between low level heuristics, which are generally applicable since they make use only of fitness information. A thorough empirical investigation is undertaken to determine the effectiveness of these techniques in using increasing amounts of CPU time to effectively generate high quality solutions.

Tabu search based methods with fixed tabu tenures are significantly outperformed by tabu search-based methods with random tenures in some fixed range. However, both of these tabu search-based hyperheuristics underperform a random sampling approach, given similar CPU times, suggesting that their choice of low level heuristics is worse than choosing the best of a random sample of low level heuristics at each iteration. Ranking methods which use adaptive reinforcement of low level heuristics, based upon their fitness performance, show much better performance. We study a method based on that of Nareyek, where positive reinforcement adds 1 to utility and negative reinforcement takes the square root of utility, and we choose preferentially low level heuristics which have high utility. This method performs well when low to medium CPU time is available, and surpasses the random sampling method. The BEBO method increases the tabu tenure for a poorly performing heuristic exponentially, and resets the penalty to a small value for low level heuristics which perform well. This approach is dominant for medium to high amounts of CPU time and again beats a random sampling approach on average for a given CPU time.

An SSR method, which discards poorly performing low level heuristics during search, does not perform well in this case where the number of iterations to generate a solution is small, since it appears to discard early on low level heuristics which are important later in the search. While this approach works reasonably well when the number of heuristics discarded is small and CPU time is high, it is outperformed by a random sampling technique when the number of low level heuristics discarded is high.

When analysing the Pareto frontier representing the best trade off between fitness and CPU time, the Nareyek-based approaches dominate for smaller CPU times, and the BEBO-based approaches dominate for large amounts of CPU time. ‘Gaps’ in the CPU time used, which arise due to difficulty in precisely controlling the total amount of CPU time of a hyperheuristic when choosing between low level heuristics which consume variable amounts of CPU time, are taken by random sampling approaches, in preference to SSR and tabu search approaches.

Overall, the methods described in this paper show that hyperheuristics provide an effective way to trade off CPU time for solution fitness, when solving complex real-world scheduling problems, and provide empirical comparisons between a wide range of hyperheuristic approaches (62 parameter sets of nine different hyperheuristic approaches). It will be interesting, in further work, to extend the range of hyperheuristics investigated to approaches which are effective when the user stops the search and the amount of CPU time is not known in advance.


  1. Bai R and Kendall G (2005). An investigation of automated planograms using a simulated annealing based hyperheuristics. In: Ibaraki T, Nonobe K and Yagiura M (eds). Meta-heuristics: Progress as Real Problem Solvers, Selected Papers from the 5th Metaheuristics International Conference (MIC 2003). Springer: New York, pp 87–108.CrossRefGoogle Scholar
  2. Baptiste P, Le Pape C and Nuijten W (2001). Constraint Based Scheduling. Kluwer Academic Publishers: London.CrossRefGoogle Scholar
  3. Burke E, Hyde M, Kendall G, Ochoa G, Özcan E and Woodward J (2010). A classification of hyperheuristic approaches. In: Gendreau M and Potvin J-Y (eds). Handbook of Metaheuristics. Springer: New York, pp 449–468.CrossRefGoogle Scholar
  4. Burke E, Kendall G and Soubeiga E (2003). A tabu-search hyperheuristic for timetabling and rostering. J Heuristics 9 (6): 451–470.CrossRefGoogle Scholar
  5. Chakhlevitch K and Cowling P (2005). Choosing the fittest subset of low level heuristics in a hyperheuristic framework. In: Raidl G and Gottlieb J (eds). Proceedings of Evolutionary Computation in Combinatorial Optimization. Lecture Notes in Computer Science, Vol. 3448. Springer: New York, pp 23–33.CrossRefGoogle Scholar
  6. Chakhlevitch K and Cowling P (2008). Hyperheuristics: Recent developments. In: Cotta C, Sevaux M and Sorensen K (eds). Adaptive and Multilevel Metaheuristics, Studies in Computational Intelligence. Vol. 136. Springer: New York, pp 3–29.CrossRefGoogle Scholar
  7. Colledge N (2009). Evolutionary approaches to dynamic mobile workforce scheduling. PhD thesis, University of Bradford, UK.Google Scholar
  8. Cowling P and Chakhlevitch K (2003). Hyperheuristic for managing a large collection of low level heuristics to schedule personnel. In: Proceedings of the 2003 IEEE Congress on Evolutionary Computation (CEC2003). IEEE Press: New York, pp 1214–1221.CrossRefGoogle Scholar
  9. Cowling P, Colledge N, Dahal K and Remde S (2006). The trade off between diversity and quality for multi-objective workforce scheduling. In: Jens G and Günther R (eds). Proceedings of Evolutionary Computation in Combinatorial Optimization. Lecture Notes in Computer Science, Vol. 3906. Springer: New York, pp 13–24.CrossRefGoogle Scholar
  10. Cowling P, Kendall G and Soubeiga E (2001). A hyperheuristic approach to scheduling a sales summit. In: Proceedings of Selected Papers from the 3rd International Conference on the Practice and Theory of Automated Timetabling (PATAT 2000). Lecture Notes in Computer Science, Vol. 2079. Springer: New York, pp 176–190.Google Scholar
  11. Fang H, Ross P and Corne D (1994). A promising hybrid GA/heuristic approach for open-shop scheduling problems. In: Cohn, A. (ed). Proceedings of the 11th European Conference on Artificial Intelligence. Wiley: New York, pp 590–594.Google Scholar
  12. Glover F and Laguna M (1997). Tabu Search. Springer: New York.CrossRefGoogle Scholar
  13. Kaelbling L, Littman M and Moore A (1996). Reinforcement learning: A survey. J Artif Intell Res 4: 237–285.Google Scholar
  14. Kendall G and Hussin N (2005a). A tabu search hyperheuristic approach to the examination timetabling problem at the MARA university of technology. In: Burke E and Trick M (eds). Proceedings of Selected Papers from the 5th International Conference on the Practice and Theory of Automated Timetabling (PATAT 2004). Lecture Notes in Computer Science, Vol. 3616. Springer: New York, pp 270–293.Google Scholar
  15. Kendall G and Hussin N (2005b). An investigation of a Tabu search based hyperheuristic for examination timetabling. In: Kendall G, Burke E and Petrovic S (eds). Proceedings of the Multidisciplinary Scheduling: Theory and Applications Conference. Springer: New York, pp 309–328.CrossRefGoogle Scholar
  16. Kendall G, Han L and Cowling P (2002). An investigation of a hyperheuristic genetic algorithm applied to a trainer scheduling problem. In: Fogel D, El-Sharkawi M, Yao X, Greenwood G, Iba H, Marrow P and Shackleton M (eds). Proceedings of Congress on Evolutionary Computation 2002. IEEE Press: New York, pp 1185–1190.Google Scholar
  17. Kolisch R and Hartmann S (2006). Experimental investigation of heuristics for resource-constrained project scheduling: An update. Eur J Opl Res 174 (1): 23–37.CrossRefGoogle Scholar
  18. Kwak B, Song N and Miller L (2005). Performance analysis of exponential backoff. IEE-ACM Trans Networking 13 (2): 343–355.CrossRefGoogle Scholar
  19. Laguna M, Marti R and Campos V (1999). Intensification and diversification with elite tabu search solutions for the linear ordering problem. Comput Opns Res 26 (12): 1217–1230.CrossRefGoogle Scholar
  20. Mladenović N and Hansen P (1997). Variable neighborhood search. Comput Opns Res 24 (11): 1097–1100.CrossRefGoogle Scholar
  21. Miller R (1997). Beyond ANOVA: Basics of Applied Statistics. Text in Statistical Science Series. Chapman Hall: London.Google Scholar
  22. Nareyek A (2004). Choosing search heuristics by non-stationary reinforcement learning. In: Resende M and de Sousa J (eds). Metaheuristics: Computer Decision-Making, Vol. 86. Kluwer Academic Publishers: Dordrecht, The Netherlands, pp 523–544.Google Scholar
  23. Pinedo M and Chao X (1999). Operations Scheduling with Applications in Manufacturing and Services. McGraw-Hill: New York.Google Scholar
  24. Remde S, Cowling P, Dahal K and Colledge N (2007). Exact/heuristic hybrids using rVNS and hyperheuristics for workforce scheduling. In: Cotta C and van Hemert J (eds). Proceedings of Evolutionary Computation in Combinatorial Optimization. Lecture Notes in Computer Science, Vol. 4464. Springer: New York, pp 188–197.CrossRefGoogle Scholar
  25. Remde S, Cowling P, Dahal K and Colledge N (2009). Binary exponential back off for tabu tenure in hyperheuristics. In: Cotta C and Cowling P (eds). Proceedings of Evolutionary Computation in Combinatorial Optimization. Lecture Notes in Computer Science, Vol. 5482. Springer: New York, pp 109–120.CrossRefGoogle Scholar
  26. Rolland E, Schilling D and Current J (1996). An efficient tabu search procedure for the p-median problem. Eur J Opl Res 96: 329–342.CrossRefGoogle Scholar
  27. Toth P and Vigo D (2001). The vehicle routing problem. SIAM Monographs on Discrete Mathematics and Applications, Philadelphia, PA, USA.Google Scholar

Copyright information

© Operational Research Society 2011

Authors and Affiliations

  • S Remde
    • 1
  • P Cowling
    • 1
  • K Dahal
    • 1
  • N Colledge
    • 1
  • E Selensky
    • 2
  1. 1.University of BradfordBradfordUK
  2. 2.Trimble MRM Ltd. (EMEA)IpswichUK

Personalised recommendations