Combining hybrid genetic search with ruin-and-recreate for solving the capacitated vehicle routing problem

The Capacitated Vehicle Routing Problem (CVRP) has been subject to intense research efforts for more than sixty years. Yet, significant algorithmic improvements are still being made. The most competitive heuristic solution algorithms of today utilize, and often combine, strategies and elements from evolutionary algorithms, local search, and ruin-and-recreate based large neighborhood search. In this paper we propose a new hybrid metaheuristic for the CVRP, where the education phase of the hybrid genetic search (HGS) algorithm proposed by (Vidal Hybrid Genetic Search for the CVRP: Open-Source Implementation and SWAP* Neighborhood 2020) is extended by applying large neighborhood search (LNS). By performing a series of computational experiments, we attempt to answer the following research questions: 1) Is it possible to gain performance by adding LNS as a component in the education phase of HGS? 2) How does the addition of LNS change the relative importance of the local search neighborhoods of HGS? 3) What is the effect of devoting computational efforts to the creation of an elite solution in the initial population of HGS? Through a set of computational experiments we answer these research questions, while at the same time obtaining a good configuration of global parameter settings for the proposed heuristic. Testing the heuristic on benchmark instances from the literature with limited computing time, it outperforms existing algorithms, both in terms of the final gap and the primal integral.


Introduction
The Capacitated Vehicle Routing Problem (CVRP) was first introduced by Dantzig and Ramser (1959) as the truck dispatching problem. Since then, the CVRP has been one of the most studied problems in operations research and combinatorial optimization. It consists of finding a set of vehicle routes, starting and ending at a depot, to deliver goods to a set of geographically dispersed customers. A feasible solution requires each customer to be visited by exactly one of the identical vehicles, and the total amount of goods delivered by a vehicle cannot exceed its capacity. The objective is to design the routes in such a way that the total distance traveled by the vehicles is minimized. For a more formal description, and mathematical formulations, of the problem we refer to Laporte (2009).
Since its introduction in 1959, numerous extensions and variants of the CVRP, considering a wide range of real-world aspects, have been studied in the literature (Laporte 2009). However, the CVRP still plays a central role due to its relative simplicity, which makes it easier to test out new ideas for both exact and heuristic solution methods, allowing for algorithmic improvements that can later be applied to more complex problem variants. Therefore, the focus of this paper is to improve the heuristic solution methods for the CVRP. For further introduction to richer vehicle routing problems we refer to the taxonomy by Eksioglu et al. (2009), the survey by Vidal et al. (2013), or the book by Toth and Vigo (2014).
The CVRP and all its extensions are known to be NP-hard (Lenstra and Kan 1981), a complexity that is severely limiting the size of the problems that can be solved to proven optimality. The state-of-the-art exact solution methods are the branch-and-cut-andprice algorithms by Pecin et al. (2017) and Pessoa et al. (2020), which combines cut and column generation with several additional mechanisms. In general, the exact methods are able to solve most instances of up to 275 customers, although the computational time required is often several days, making the methods impractical in many settings. As a consequence, a considerable part of the literature consists of heuristic solution methods, where the most successful ones are metaheuristics.
The term "metaheuristic" was first coined by Glover (1986), and it has since been used to describe heuristics with additional strategies enabling search past any encountered local optima. A huge number of metaheuristics have been proposed for the CVRP, and they are usually classified into three main types that are neither disjoint nor exhaustive: neighborhood-centered searches, population-based methods, and hybrid metaheuristics (Laporte et al. 2014). Neighborhood-centered metaheuristics utilize a local search procedure, local improvement heuristics, and strategies for escaping local optima, to improve a single incumbent solution. Population-based methods maintain a set of solutions throughout the search process, where in each iteration, information from multiple solutions may be used to create new ones. Hybrid metaheuristics combine multiple approaches. The current state-of-the-art metaheuristics for the CVRP include: Hybrid Iterated Local Search (HILS) (Subramanian et al. 2013), Knowledge Guided Local Search (KGLS) , Slack Induction by String Removals (SISR) (Christiaens and Vanden Berghe 2020), and Hybrid Genetic Search (HGS) (Vidal et al. 2012;Vidal 2020).
HILS is a hybrid metaheuristic that combines the neighborhood-centered Iterated Local Search (ILS) (Baxter 1981) with an exact algorithm for solving the set partitioning problem. ILS escapes local optima by perturbing the solution to achieve a new starting point for the local improvement heuristic. One iteration of HILS first uses ILS to find a locally optimal solution where some of the solution's routes are added to a route pool, before the exact algorithm finds the optimal partition of routes from the pool. Voudouris and Tsang (2003) proposed the Guided Local Search (GLS) metaheuristic, which penalizes "bad" features of the solution, for instance edges, effectively changing the objective function. KGLS builds upon GLS using knowledge obtained through data mining to find the most important features of optimal or near-optimal solutions, and this information determines which edges to penalize. SISR, proposed in Christiaens and Vanden Berghe (2020), is based on a local search heuristic guided by Simulated Annealing (Kirkpatrick et al. 1983), in addition to a mechanism for minimizing the vehicle fleet. Instead of using classical neighborhoods for the VRP, it performs large neighborhood search (LNS) defined by ruin-and-recreate (R&R) operators. Finally, HGS is a population-based method extending the genetic algorithm proposed by Prins (2004). It maintains a diverse population of individual solutions. In each iteration, these individuals are recombined and the resulting offspring are "educated" with a local search heuristic. Vidal et al. (2012) presented a version of HGS called Hybrid Genetic Search with Advanced Diversity Control (HGSADC), and an improved version specialized for the CVRP, namely HGS-CVRP, was proposed in Vidal (2020). This paper focuses on the latter and refer to it as HGS.
Several improvements have been made on metaheuristics for the CVRP during recent years, although most of the improvements have not come from new metaheuristic concepts, but rather from more efficient and focused local improvement heuristics. HGS achieved a significant performance improvement by means of an additional neighborhood in the local search, as reported by Vidal (2020). Similarly, the special R&R operators proposed in Christiaens and Vanden Berghe (2020) arguably constitute the most important factor for the success of the SISR metaheuristic. However, introducing more neighborhoods to a local search does not guarantee improvement, as it is only beneficial if the neighborhoods are complementary. The intersection between neighborhoods should be small or empty so that each neighborhood tends to find different solutions. Additionally, the size of the neighborhoods must be relatively small such that the computational complexity of exploring them is manageable. These challenges may be summarized in the fundamental balance between intensification and diversification, which is vital for a metaheuristic to be successful (Glover 1989).
The overall goal of this paper is to investigate whether combining HGS and LNS is a way to create a competitive solution method for the CVRP. To this end, we formulate the following research questions: 1. Is it possible to gain performance by adding LNS as a component in the education phase of HGS?
2. How does the addition of LNS change the relative importance of the local search neighborhoods of HGS? 3. What is the effect of devoting computational efforts to the creation of an elite solution in the initial population of HGS?
To answer these questions, we design a metaheuristic called Hybrid Genetic Search with Ruin-and-Recreate (HGSRR) and a sequence of computational experiments. HGSRR combines elements from the latest version of HGS (Vidal 2020) and the recently proposed SISR metaheuristic by Christiaens and Vanden Berghe (2020) that is based on LNS.
The remainder of this paper is structured as follows. Section 2 describes the proposed hybrid metaheuristic and its components. The sequence of experiments and the final benchmark test comparing the hybrid metaheuristic against the state-of-the-art are presented in Section 3. Finally, in Section 4, concluding remarks are presented in addition to suggestions for future research.

Hybrid genetic search with ruin-and-recreate
In short, HGSRR is a genetic algorithm that iterates a cycle consisting of seven components divided into three main phases: Recombination, Education, and Population Management. Six of the components are the same as in the original HGS, while the seventh is responsible for search in the R&R-based neighborhood from SISR. Fig. 1 shows a flowchart of the hybrid method.
The Recombination phase is made up of two components that are responsible for the recombination of individuals. Component (1) selects two parent solutions from the population, whereas component (2) performs crossover of the selected parents in order to recombine them into an offspring solution.
The Education phase is responsible for improving the offspring solution. Because the solutions in the recombination phase are represented as giant tours, a split algorithm is applied to the offspring solution in component (3). A giant tour contains all customers on a single tour without considering the capacity constraint, and to restore feasibility, the split algorithm is used to divide the giant tour into feasible routes in an optimal way. Component (4) then runs a local search on the solution until a local optimum is found, and component (5) continues to search in the R&R-based neighborhood, attempting to further improve the solution. It should be noted that even though component (3) creates feasible vehicle routes with respect to capacity, exceeding the capacity during the Education phase is permitted in exchange of a penalty, potentially leading to infeasible solutions. The penalty is dynamically adjusted based on a target fraction of ξ R E F feasible individuals in the population. In addition, if the solution remains infeasible after the Education phase, there is a 50% probability for the phase to restart with a ten times higher penalty coefficient.
Once the offspring solution is educated, the final Population Management phase starts. It includes two components, where component (6) first evaluates the educated offspring and calculates its fitness relative to the population, before component (7) determines if the offspring is among the individuals transferred to the next generation. The three phases are then repeated and the genetic cycle continues until either the predetermined time limit T M AX is reached, or N I T iterations are performed without improvement. In the former case, the algorithm is terminated, while in the latter case, the algorithm is restarted.
To initialize the population, the method starts by generating μ I individuals, where each individual is a random giant tour before the tour is split and the individual is educated, as in the genetic cycle. However, in contrast to the original HGS, a single individual goes through elite education that includes an intensified version of the R&R search in component (5). The purpose of the elite education is to quickly find a high quality solution, thus improving the convergence of the hybrid method. Unlike the local search in component (4), the search based on the R&R operators of the SISR metaheuristic may accept worsening moves, allowing it to escape local optima, and thus it can be used to continue improvement of a given solution.
The HGSRR with its elite education can be seen as a combination between the population-based HGS and the neighborhood-centered SISR. If no time is spent on Fig. 2 Illustration of the ordered crossover operator. Two parents, represented on the left, are combined to form the offspring solution to the right. The dashed lines is the crossover region elite education, the HGSRR is the same as an extended version of HGS, and otherwise, if the elite education takes up all the time, the HGSRR is similar to the SISR metaheuristic without the fleet minimization mechanism. The balance between the two is explored in this paper, and results from the experiments are presented in Section 3. A short description of each component used in the HGSRR is given in the following subsections.

Parent selection
A tournament-based selection scheme is used to select a parent, where k ≥ 2 individuals are randomly selected from the population. Out of these k individuals, the one with the best fitness is selected as the tournament winner. In HGSRR, k = 2 is used, as higher values of k increase the likelihood of the best solutions being selected, implying a higher selection pressure, leading to less diversity. Since we use binary crossover, two parent solutions are required, and therefore, the tournament selection is performed twice.

Crossover
An ordered crossover operator (Oliver et al. 1987), where the offspring solution inherits parts of the order from each of the two parent solutions, is used to recombine the two parents using their giant tour representation. Fig. 2 illustrates the ordered crossover operator. A random crossover region is selected, in which the customers within the region are copied from the giant tour of the first parent. Starting outside the crossover region, the remaining customers are copied in order from the giant tour of the second parent, possibly wrapping around.

Split
Because individuals are represented as a giant tour in the crossover, a dynamic programming algorithm called Split, is used to optimally divide the giant tour into segments defining feasible vehicle routes. This approach was first presented and used by Prins (2004).
A giant tour can be represented as a directed acyclic graph (DAG), and the process of splitting the giant tour can be seen as finding the shortest path in the DAG, illustrated  Fig. 3. In the example, the giant tour consists of five customers in order, where the arcs represent feasible routes and the arc cost corresponds to the length of the route. We see that there is an arc from the depot to the second node, but not to the third. This implies that a route with the first two customers is feasible, and the absence of an arc to the third customer means that a route with the first three customers would violate the capacity constraint. The optimal split is given by the set of arcs making up the shortest path in the graph, and it can be found in O(n 2 ) using Bellman's algorithm, as shown by Prins (2004). However, like in the HGS by Vidal (2020), the specific implementation relies on a linear-time split algorithm exploiting dominance rules, as described in Vidal (2016).

Local search
A local search is performed to improve the offspring solution, in which ten complementary neighborhoods are explored in order using a first-improvement strategy. I.e., the local search cycles through the different types of neighborhoods in order and immediately applies the encountered moves if they improve the solution. These neighborhoods are the same as those used in the latest version of HGS by Vidal (2020). Nine of the neighborhoods were first presented in Prins (2004) and they are denoted (M1 -M9). The neighborhoods include relocation of one or two vertices, swap of one or two vertices, in addition to the intra-route 2-opt, and two versions of the inter-route 2-opt* neighborhoods. These neighborhoods are defined by the moves for all pairs of distinct vertices, and consequently, the neighborhoods can generally be explored in O(n 2 ). However, a move is only considered if it involves a pair of vertices that are geographically close, and the granularity parameter is used to restrict the neighborhoods, allowing for an exhaustive exploration in O( n).
The tenth and final neighborhood is defined by the SWAP* move proposed by Vidal (2020). Following the same notation as Prins (2004) and Vidal et al. (2012), this is denoted M10. It is defined for pairs of routes with overlapping circle sectors. For a pair of routes, one customer from each route is swapped, however, instead of inserting the customer in place of the other, the customers are optimally relocated into the opposite route. Despite generally containing O(n 4 ) moves, the neighborhood can be explored in sub-quadratic time. For further details on the last neighborhood, the reader is referred to Vidal (2020).

R&R search
The R&R search distinguishes HGSRR from the original HGS by Vidal (2020). Its purpose is to further improve the offspring solutions after the local search. It searches in the large neighborhood defined by the new R&R operators proposed in Christiaens and Vanden Berghe (2020). A single R&R move from a solution s results in the solution s * , and is performed by first removing a set of customers, before the customers are inserted back into the solution.
The removal operator is called Adjacent String Removal, and it attempts to induce two important types of slack into the solution. A sufficient number of customers must be removed from each route to induce capacity slack in the route, as other customers that are more suitable to be served by the route cannot be feasibly added unless the needed capacity is freed up. The second type of slack is the spatial slack, given by the reachable area of the vehicle operating the route. Even though the distance driven by a vehicle is unlimited in the CVRP, it can be seen as a resource like the capacity, but instead of being a constraint, it is minimized by the objective function. Removing customers from a route induces spatial slack by increasing the reachable area of the route such that other customers can be visited without exceeding the original distance. To induce both types of slack, the operator starts with a randomly chosen customer from which it removes strings of adjacent customers that are geographically close across multiple routes. The selection of the adjacent strings are controlled by two parameters where c is the average number of customers to remove and L M AX is the maximum length for a single string of customers.
The recreate operator is based on the greedy insertion approach and is called Greedy Insertion with Blinks. It inserts the removed customers sequentially, and the order of the sequence is determined by sorting the removed customers in one of four ways: Random, Demand, Close, and Far. The first is random order, while the second starts by inserting the removed customer with the largest demand. The last two ways sort the customers based on their distance to the depot, starting with the customer either closest or farthest away. Instead of always inserting the customer in the position with lowest additional cost, the operator searches through all feasible insertion points but may "blink", i.e., disregard a given point, with a probability of β. The effect is that an insertion point with rank r according to cost has probability p(r ) = (1 − β)β (r −1) of being chosen. The blinking may come across as counter-intuitive, as not inserting a customer into the best position may result in a worse solution. However, the blinking introduces some randomness to the operator, reducing the greediness compared to only choosing the best position, and Christiaens and Vanden Berghe (2020) experienced that the probabilistic blink element improved the overall performance of the SISR metaheuristic. For further details on the two operators, we refer to the original paper by Christiaens and Vanden Berghe (2020).
Like in the SISR metaheuristic, a simulated annealing algorithm with an exponential cooling schedule is used to control the search such that the probability for accepting a move to a new solution s * is dependent on the current temperature T and the difference in the objective values z = z(s) − z(s * ). Let T be the temperature variable, T 0 the initial temperature, and T f the final temperature. The number of moves N R in the R&R search is a linear function of the instance size n determined by the parameter γ : N R = γ n . The cooling factor c follows from N R , T 0 , and T f : A pseudocode of the R&R component is presented in Algorithm 1. First, the input solution is saved as the best solution found during the R&R search, s B (line 1). The temperature T is set to the starting temperature T 0 (line 2). For every iteration in the algorithm, solution s * is determined by applying the R&R operators from SISR (lines 3 and 4). If the move is accepted in line (5), the current solution s is updated in line (6). Additionally, if the new solution is the best found during the search, s B is updated (lines 7 and 8). The last step in one iteration of the search is to reduce the temperature by multiplying it with the cooling factor in line (9). Finally, in line (10), the best solution found during the R&R search is returned.

Algorithm 1: R&R Search
Input: s ← Offspring solution after local search T ← cT 10 return s B

Fitness calculation
The individual's fitness is used to measure its quality relative to the other individuals in the population. To balance the diversification and intensification of the search, the fitness combines two measures: the objective value of the solution and the individual's contribution to the population diversity. The latter is calculated as the average normalized Hamming distance to the other individuals in the population. Additionally, since the solutions may become infeasible after the education phase due to the relaxation of the capacity constraint, two subpopulations are maintained, i.e., one for feasible and one for infeasible solutions. Each individual p ∈ P in the subpopulation is given ranks r O B J p and r DI V p , corresponding to the two measures, respectively. A slightly smaller weight is given to the diversity contribution to ensure that the N E individuals in the subpopulation with the best objective values always survive. The formula for the fitness is shown Equation (1).

Survivor selection
A generational model is used to manage the subpopulations, in which μ is defined as the minimum population size, and λ is the generation size. Therefore, both subpopulations manage between μ and μ + λ individuals each. After an offspring is educated, its fitness is calculated and it is inserted into the respective subpopulation based on feasibility. All the individuals in the subpopulation are kept unless the size of the subpopulation exceeds the upper limit of μ + λ individuals. Once the upper limit is reached, individuals are removed from the subpopulation based on two criteria. First, any clones are removed, and second, if all individuals are unique, the individual with the worst fitness is chosen for removal. Since the diversity contribution of an individual is only defined relative to the subpopulation, the fitness of all the individuals in the subpopulation must be recalculated every time the subpopulation changes. The process of removing individuals from the subpopulation and updating the survivors' fitness is repeated until the size of the subpopulation is back at μ.

Computational experiments
To answer the research questions posed in the introduction and obtain well performing parameter values for the HGSRR metaheuristic, we design a sequence of six experiments. The R&R Search Experiment investigates the effect of adding LNS to the education phase of HGS, and determines what parameter values yield good performance. In the Neighborhood Selection Experiment, fifteen configurations of local search and LNS neighborhoods are tested to determine whether a reduced set of operators may be used without substantial loss of performance. The Elite Education Experiment determines the effect of devoting computational efforts to the creation of an elite solution in the initial population, and finds parameter values that aim to strike a good balance between the efforts devoted to elite education and the remaining search components. The Diversity Control Experiment investigates whether the introduction of an elite solution in the initial population necessitates stronger diversity control. The HGSRR metaheuristic configured through the above four experiments is compared to high performing competitors in a Benchmark Test. Finally, in the Best Known Solutions Experiment HGSRR is run for 16 hours with the aim of finding new best known solutions.

Experimental setup
The computational experiments are conducted on the X dataset by Uchoa et al. (2017) with 100 instances, where the number of customers vary between 100 and 1000. They cover a wide range of characteristics such as route length, customer and depot positioning, and demand distribution. The web page CVRPLIB (2021) contains instance definitions and best known solutions for the X dataset and other standard CVRP benchmarks. The first four experiments are run on a subset including 15 of the 100 benchmark instances. The selected instances are chosen such that they cover a wide range of different characteristics. The Benchmark Test is run on all 100 X-instances, whereas the Best Known Solutions Experiment is run on all X instances for which a proven optimal solution has not been found, plus a benchmark of ten very large CVRP instances due to . HGSRR was implemented using version 1.49.0 of the Rust programming language, and it was compiled with the Rust compiler. The implementation uses the same parameter values as in the original papers by Vidal (2020) and Christiaens and Vanden Berghe (2020), unless explicitly stated otherwise. A complete list of the parameters and their corresponding values is found in Appendix A.
The experimental setup is close to a replication of the experimental setup in Vidal (2020), such that the HGSRR results can be compared directly against the results presented there. However, to account for differences in computing power, the time limit for each problem instance is adjusted relative to the processors' single-threaded ratings. The original time limit T M AX is 240 100 n seconds, corresponding to four minutes of runtime per 100 customers. All the tests are single-threaded and they are performed on a 2.3GHz Intel E5-2670v3 processor, which according to PassMark Software (2021) is 23% slower than the processor used in the setup of Vidal (2020). Consequently, the time limit is extended by 23% in our setup.
For each run, the best solution value, z(t i ), is recorded after t i ∈ T percentages of the time limit.
The results of the experiments and the benchmark test conducted in this paper are then presented as convergence profiles. In the first four experiments, one convergence profile corresponds to a specific configuration of HGSRR 1 , while in the Benchmark Test presented in Section 3.6, the convergence profiles correspond to different methods. Furthermore, all the results are reported as the average of ten independent runs to increase statistical significance.
Since there is usually a trade-off between the solution quality and the CPU time, one method or configuration can, in general, only be considered better than another if it finds better solutions in a shorter amount of time. Therefore, we define two metrics from the convergence profiles, and these are used to determine which method or configuration is preferable. The first metric is called the final gap, denoted g F . It is defined as the gap between the solution after 100% of the time limit, denoted z F , and the best known solution (BKS) for the instance. The gap to the BKS after a given percentage of the time limit is defined in Equation (2). To get an accurate comparison between the HGSRR and the other metaheuristics in the benchmark test, the BKSs used are the same as the ones presented in Vidal (2020), even though better solutions have later been found for some of the instances.
The final gap is dependent on the predefined time limit and only represents the performance at a single point during the runtime, therefore, we also report the average gap, denoted g AV G , which is based on the primal integral defined by Berthold (2013). The primal integral is a calculation of the area under the curve of the convergence profile, and since the time is normalized, the primal integral is the same as the average gap relative to the BKS. It can be seen as the average of the best solution gaps if the runs were randomly stopped at any given time up until the time limit. To approximate the primal integral, let z AV G be defined as the average solution value, as shown in Equation (3), where t 0 = 0. The average gap is then calculated as the gap between the average solution value and the BKS using Equation (2).
To assure that the results of the experiments presented are statistical significant, we have used a one-tailed Wilcoxon signed-rank test (Wilcoxon 1945). In the test we evaluate two hypotheses, the null-hypothesis H 0 , and the alternative hypothesis H 1 , defined as follows: where z denotes the solution value, which can be either z F or z AV G . Each parameter configuration or method is denoted X , and we define X Best = argmin{z(X )}, i.e., it is the configuration with the lowest z-value for the given experiment. With a significance level α of 2.5%, the tested hypotheses with p-values lower than α are rejected.
Failing to reject H 0 means the performance of the two configurations in the experiment are statistically indistinguishable. However, if H 0 is rejected, we continue to test H 1 , and rejecting it as well implies that X Best performed better than X . These hypotheses are tested on the experimental results presented in this paper, and the configurations that are not significantly worse than X Best for a given experiment are highlighted in bold in Tables 1 to 5.

R&R search experiment
The goal of the R&R Search Experiment is to test whether R&R search performed after local search education can improve the original HGS. Recall from Section 2.5 that the R&R search is controlled by three parameters: γ that determines the number of R&R moves, the start temperature T 0 , and the final temperature T f . Together the two temperature parameters and the number of moves define the cooling schedule. We fix the final temperature, T f = 1, as this ensures a very low probability for accepting deteriorating moves in the final part of the search. That leaves the former two parameters, and the experiment is therefore designed to test different combinations Fig. 4 Convergence profiles from the R&R Search Experiment that tests different combinations of the γ and T 0 parameters, shown on a logarithmic scale of γ and T 0 . Note that the experiment is not designed to find the precise combination of parameters performing the best, but rather to find the correct order of magnitude. The following parameter value combinations are tested: γ ∈ {0, 0.1, 1.0, 10.0} and T 0 ∈ {1, 10, 50, 100}. Figure 4 shows the results from the R&R Search Experiment. Each curve in the figure is the convergence profile of the extended HGS metaheuristic for a specific combination of γ and T 0 . Notice that in the case of γ = 0, the R&R search is not run at all, and the method is the same as the original HGS. By looking at subfigures (c) and (d) with higher start temperatures, the results indicate that the method performs worse as γ increases. Even though an exponential cooling schedule is used, the combination of high start temperature and larger γ -values make the method spend too much time accepting deteriorating moves, and it becomes increasingly difficult to find an improved solution. On the other hand, if the start temperature is lower, as shown in subfigures (a) and (b), the results indicate that most of the combinations of parameters work somewhat better than the original HGS. Table 1 summarizes the metrics for the best parameter combinations found in the R&R Search Experiment, and compares them to our reimplementation of the original HGS. The results show that there is no statistically significant difference between any combination of γ ∈ {1.0, 10.0}, and T 0 ∈ {1, 10}. This indicates that the HGSRR is relatively robust with respect to the values of the γ and T 0 parameters. Further, the results show that dedicating some computing time towards R&R search leads to statistically significant better solutions than the original HGS. Even though there is  no statistically significant difference between results for the four parameter configurations, we have selected to proceed with the setting that gives the lowest g F and g AV G which is γ = 1.0 and T 0 = 10. The proportion of time spent on each of the components in the extended HGS metaheuristic are illustrated in Fig. 5. The results show that the temperature parameters affect the performance of the R&R search, but they are considered independent of its time consumption. I.e., the time consumption is correlated with the number of moves determined only by the γ parameter and the problem size. Therefore, each of the horizontal bars shows the distribution of time consumption for the different γ parameter values in the experiment. In the original HGS, corresponding to γ = 0, the local search is clearly the most computationally expensive component taking more than 85% of the time, on average. For the best performing parameter combination, the local search is still responsible for around 55%, while the R&R search takes around 35%. Out of the ten neighborhoods in the local search, the neighborhood defined by move M10 is the most expensive to explore with around 8% of the total time. This implies that with the best parameter configuration, the R&R-based neighborhood is by far the most time-consuming neighborhood to explore.

Neighborhood selection experiment
The Neighborhood Selection Experiment explores whether adding the R&R component allows us to remove some of the ten local search neighborhoods without significant performance deterioration. As Arnold and Sörensen (2019) points out, the use of several neighborhoods is only beneficial if they complement each other. I.e., the intersection between the neighborhoods should be small or empty so that each neighborhood finds different solutions. By design, this is the case for the ten neighborhoods used in the HGS local search. However, the R&R search component may in effect replace some of the local search neighborhoods. Thus, it might be beneficial for the extended HGS to only use a subset of the ten neighborhoods in order to avoid unnecessary computation. Table 2 shows the metrics for different combinations of search neighborhoods and the R&R component. The first row shows the results of the HGSRR with all eleven neighborhoods enabled, using the best parameter configuration from the R&R Search Experiment presented in Section 3.2. The next eleven rows present metrics for configurations where one of the eleven neighborhoods is removed, ordered by ascending average gap. The results give some indication on the relative importance of each neighborhood, however, the relatively small differences in addition to the lack of statistical significance make it difficult to draw any strong conclusions.
Separately removing the neighborhoods defined by moves M3, M2, M7, and M4 all improve performance, and these neighborhoods may therefore be considered less important. In contrast, removing the neighborhoods defined by R&R, M8, or M10 seems to worsen performance the most, and thus one could argue that these neighborhoods are more important. Since removing M3 yielded the best average gap, the same approach is repeated in which one more neighborhood is removed at a time. Unfortunately, no combination where removing M3 and one additional neighborhood achieved a better average or final gap, although the best one of these is listed in Table  2 as Subset 1.
Subset 2 and Subset 3 are the results of attempting to remove even more neighborhoods because a method with fewer neighborhoods requires less implementation and may be considered less complex. Subset 2 is defined by simultaneously removing all the neighborhoods that individually improved the average gap when being removed. Subset 3, on the other hand, is defined by removing all the neighborhoods that, when being the only one removed, did not increase the average gap by more than 0.1 percentage points compared to the full HGSRR. Despite none of the subsets achieving the best average or final gap, it is noteworthy that Subset 3, which removes six of the neighborhoods from the local search, still performs better than the original HGS without R&R.
The convergence profiles for the relevant neighborhood combinations from Table  2 are shown in Fig. 6. Even though the R&R Search Experiment described in Section 3.2 indicates that the inclusion of the R&R-based neighborhood positively impacts the method's convergence, Fig. 6 shows that the convergence can be further improved by removing a few neighborhoods. However, notice that it is no longer the case if too many neighborhoods are removed, as in Subset 2 and Subset 3.
Although there are several combinations of neighborhoods that we cannot statistically distinguish, the best performing combination, excluding the neighborhood defined by move M3, is used throughout the following experimental investigations.

Elite education experiment
Similar to the R&R Search Experiment, the Elite Education Experiment tests combinations of parameters in the R&R search, although this time it is used for elite education. Recall that elite education consists of the same education phase as previously, but   Table 2 the R&R search is run for much longer in order to generate a single incumbent elite solution before the rest of the population is initialized. The starting temperature is denoted T E 0 , and the parameter determining the number of R&R moves in the elite education is denoted γ E instead of γ . Once again, the experiment is designed to find parameter values that yield good performance, not necessarily optimal values for our experimental setup. Thus the following parameter value combinations are tested: γ E ∈ {1, 1000, 10000, 100000, 200000} and T E 0 ∈ {1, 10, 50, 100}. Note that γ E = 1 corresponds to no elite education, as the same value is used for education of the remaining individuals. I.e., γ = 1 is used during regular education.
The convergence profiles are shown in Fig. 7. Starting with subfigure (a), in which the starting temperature is the lowest, the results show little difference as long as the γ E parameter is relatively low. However, for the largest values, the convergence is much slower. This is likely due to the increased time spent on the elite education, leaving less time for the remaining components.
Subfigure (b) indicates that increasing the starting temperature to 10 finds somewhat better solutions early on, but has little impact on the final solution gap. Moreover, it is noteworthy that the solutions early on become better with a higher number of moves and a longer cooling schedule obtained by increasing γ E . The results indicate that spending time on elite education of a single individual improves the convergence of the method.
The same trend can be seen in subfigures (c) and (d) for values of γ E up to 10,000. However, a further increase in γ E , in combination with the higher starting temperatures, seems to considerably delay the convergence. A higher starting temperature implies a greater probability of accepting deteriorating moves early on, and thus, a stronger exploration of the search space during the R&R search. Despite delaying the convergence, the results indicate that the increased exploration from a higher starting temperature and a larger number of R&R moves, tend to improve the final gap. Notice that higher values for γ E increase the time spent on elite education and thus reduces  the number of iterations of the genetic cycle that can be performed within the given time limit. Table 3 shows the metrics for relevant parameter combinations from the Elite Education Experiment. We observe that a better final gap is achieved by increasing γ E , thus spending more time on the elite solution, although it cannot be increased too much. The best final gap is achieved with T E 0 = 50 and γ E = 100, 000, but increasing T E 0 to 100 does not make it statistically worse. Thus, the starting temperature may not be that important for the final gap when the elite education is given more time.
However, the gains in the final gap by increasing γ E above 10,000 seem to come at the expense of the average gap. The best average gap is achieved with T E 0 = 50 and γ E = 10, 000, where all other configurations are statistically worse. Both of the The computational time devoted to each of the components in the HGSRR, including the elite education, are shown in Fig. 8. The local search and the R&R search within the genetic cycle are grouped together as Education, while Elite Education represents the local search and R&R search on the single incumbent elite solution. In the experiment, γ E = 10, 000 achieved the best average gap, and Fig. 8 shows that HGSRR with this setting devotes around 5% of the total computing time to finding a single elite solution. The best final gap was achieved with γ E = 100, 000, and in that case, the elite education takes more than 40% of the total time spent. Even 5% is a significant amount to spend on a single individual, as the search process may visit thousands or even millions of unique individuals.

Diversity control experiment
Injecting a single, very good solution into the population of a genetic algorithm may lead to the population losing diversity as the injected solution may become too dominant. The loss of diversity decreases the genetic algorithm's ability to explore different parts of the search space, which is crucial for performance. Recall from Section 2.6 that the HGSRR, similar to the HGS, combines the diversity contribution of a solution with its objective value in order to calculate the fitness. Hopefully, this mechanism is enough to maintain diversity even though the population is initialized with an elite individual. The Diversity Control Experiment is designed to check the importance of the mechanism, and test different values for N E , the parameter controlling the diversity contribution in the fitness function. We test the two γ E values that yield the best metrics values in the Elite Education Experiment, in combination with N E ∈ {1, 2, 4, 6, 8}.
Convergence profiles for each of the two best performing γ E values from the Elite Education Experiment are shown in Fig. 9. The results indicate that the choice of N E plays a significant role in the performance of the method. Adjusting the N E parameter is one way to control the balance between intensification and diversification. Both subfigures show that the convergence profiles are almost overlapping during the beginning of the runs, as the N E parameter is irrelevant in the elite education phase. However, rather early in the search process, the convergence profiles diverge. The timing corresponds very well with the time spent on elite education, as presented in Fig. 8.   Table 4 presents the metrics of the best convergence profiles from the experiment. Regardless of the value of γ E , N E = 4 yields the best performance. However, reducing N E to 2 with γ E = 100, 000 does not statistically worsen the final gap, which is likely due to leaving less time for the genetic algorithm where the diversity is of importance. As it is preferable with a single single parameter value, N E = 4 is chosen, which is the same as used in Vidal (2020). Although the results show that the parameter is important for the performance, the injection of an initial elite individual into the population did not change the best parameter value. Therefore, the results indicate that the HGSRR metaheuristic does not require additional diversification to counteract the dominant elite solution. Since the best value has been used in the previous experiments, the best metrics are the same as shown in Table 3.

Benchmark test
This section presents the results of a complete benchmark test where the proposed HGSRR metaheuristic is compared against state-of-the-art metaheuristics on the full X dataset from Uchoa et al. (2017). Two different versions of HGSRR are included in the benchmark test, and these correspond to the two best configurations found in the Diversity Control Experiment. Let HGSRR1 denote the configuration achieving the best average gap (γ E = 10, 000), and HGSRR2 is the configuration with the best final  (2019), SISR by Christiaens and Vanden Berghe (2020), and HGS by Vidal (2020). A complete list with the results on each of the problem instances is found in Appendix B.
The convergence profiles of the methods in the Benchmark Test are presented in Fig. 10. Both versions of the HGSRR achieve a better final gap than the rest, although the differences are relatively small in terms of absolute values. This is not unexpected, given the hard competition. HGSRR1 achieved the best average gap in addition to a slightly better final gap than the original HGS. Furthermore, the best final gap is achieved by HGSRR2, but with a slower convergence, the method has a somewhat worse average gap.
Even though the computing times of each solution method presented in this section are adjusted based on the hardware used to produce them, other factors such as the choice of programming language, compiler, and other implementation details may still lead to an unfair comparison. To verify that we do not introduce an unfair advantage to HGSRR, we have tested our implementation of HGS in Rust against the original HGS implemented in C++. The results show that the two implementations have very similar convergence profiles, although the Rust implementation performs consistently and significantly worse, indicating that the real relative difference in performance in favor of HGSRR may be bigger than presented in this paper. Detailed results from the comparison of the two implementations of HGS may be found in Appendix C.
To get more information about the strengths and weaknesses of the different methods, their performance on various subsets of the instances is analyzed. Like in the benchmark test by Vidal (2020)   The convergence profiles of the methods in the benchmark on the different subsets are shown in Fig. 11, while the metrics are shown in Table 5. By inspecting the different subfigures and the summarized metrics, the same observation as previously can be seen. In general, HGSRR1 performed significantly better in terms of average gap, meanwhile the significantly best final gap was achieved by HGSRR2. However, this is not the case for the subset with the smallest problem instances, where the original HGS achieves the significantly best final and average gaps. It is only the final gaps on the instances with long routes that are not statistically different, where HGS and the two versions of HGSRR are indistinguishable within the given confidence level used in the statistical tests.
Most of the BKSs are proven to be optimal for the smallest instances, and the solutions achieved by the best metaheuristics are either the same as the BKSs, or very close. It can be argued that heavier intensification may help identify good solutions faster, although to find the optimal or close to the optimal solution, a strong diversification is required. Recall the trade-off between intensification and diversification. Since the original HGS does not include R&R search or elite education, the genetic algorithm can run more genetic cycles, or alternatively, achieve more restarts of the algorithm. As a result, the HGS is considered heavier weighted towards diversification compared to both versions of the HGSRR. Higher degree of diversification seems to be an advantage on the smaller instances, but in contrast, it may be a drawback on the larger instances in which even the best metaheuristics are still relatively far away from the best known solutions.
Despite both versions of the HGSRR performing slightly worse on the smaller instances, their performance on the other subsets is noteworthy. Especially their performance on the larger instances should be highlighted, as the final gap of the HGSRR2 is more than 0.04 percentage points better than the HGS, and the average gap of the HGSRR1 is more than 0.06 percentage points better than the HGS. The significant improvements by the two HGSRR versions on both of the metrics on the larger instances in terms of absolute values are the strongest contributor to their overall performance. On the complete X dataset, HGSRR2 reduces the final gap by 0.0122 percentage points, and HGSRR1 reduces the average gap by 0.0274 percentage points. Although the improvements are small in absolute numbers, they correspond to a relative reduction in the final and average gaps of 11.25% and 15.62%, respectively.
Since the difference in performance seems to be related to the size of the problems, and the final gaps are correlated with the problem size, it is interesting to take a closer look at the distributions of the final gaps. Fig. 12 shows the distributions of final gaps for each of the metaheuristics illustrated as boxplots, and the same five quartiles from the distributions are summarized in Table  6. As the solutions are closer to the BKSs on the smaller problem instances, both the 25 th percentile and the median are best for HGS that performs better on the smallest instances. At the same time, the 75 th percentile and the maximum are significantly lower for both versions of the HGSRR. The spread of the final gaps is clearly the lowest for HGSRR2. I.e., it has the lowest difference in the worst and the best final gaps across all the instances. A lower spread in the final gaps may indicate a more robust method in terms of lower variance in the final gaps on problem instances from the same distribution. Note that nothing can be said about the variance across multiple runs on the same problem instance, only about the expected variance across multiple instances from the same distribution as the benchmark instances.

Best known solutions experiment
In a final experiment we run the HGSRR for an extended time limit of 16 hours in an attempt to improve the best known solutions, which can be found on the CVRPLIB (2021) website. The experiment is conducted on the 48 instances in the X dataset for which a proven optimal solution has not yet been found. The parameter configuration remains the same as in the previous experiments, except for the γ E controlling the scope of the elite education. Because larger γ E resulted in better final gap in the Benchmark Test, the parameter value is further increased to 1,000,000 given the prolonged runtime. The experiment resulted in two new improved solutions, as listed in Table 7. An overview of the currently used solutions

Conclusions
The Capacitated Vehicle Routing Problem is a central problem in combinatorial optimization. Due to its combination of industrial relevance and scientific interest, the CVRP has been subject to intense research efforts for more than sixty years. Still, there is strong competition regarding efficient solution algorithms, and substantial improvements are being made. Among the most competitive heuristics for the CVRP of today are Hybrid Genetic Search (HGS) by Vidal (2020), and Slack Induction by String Removals (SISR) by Christiaens and Vanden Berghe (2020). HGS may be characterized as a memetic algorithm where a population of solutions is evolved, new solutions are created through recombination, and the solutions are improved by local search. In contrast, SISR is a trajectory-based metaheuristic where a single solution is iteratively improved through large neighborhood search (LNS) with ruin and recreate (R&R) operators and acceptance is governed by simulated annealing.
To meet our goal of investigating combinations of HGS and LNS for the CVRP, we formulated three research questions. To answer them, we developed a novel metaheuristic and conducted a sequence of six computational experiments. Our proposed metaheuristic is called Hybrid Genetic Search with Ruin and Recreate (HGSRR). It combines HGS and SISR in two different ways: First, the R&R based LNS of SISR complements the local search operators of HGS in the improvement of solutions in each generation of the population, also called education. Second, SISR is used to create an elite solution in the initial population.
Our experimental results show that devoting a substantial part of the computing time in the education phase (around 35%) to the R&R large neighborhood search of SISR yields the best results and improves both the average gap and the final gap, compared to the original HGS and SISR methods. Further, our experiments clearly indicate that adding SISR changes the relative importance of the local search operators employed by HGS. In fact, the results show that removing the M3 neighborhood operator (i.e., relocation of a sequence of two customers) from the ten original operators of HGS provides the best results of the 15 configurations investigated. SISR may even replace six of the ten HGS operators and still exhibit performance improvement relative to the original HGS.
Our results also show that it pays off to devote between 5% and 40% of the total computational effort to create an elite solution for the initial population, both in terms of convergence and the final gap. However, different configurations yield the best results for these two performance metrics. Further, we observe that injecting an elite solution in the initial population of HGS does not require additional diversity control mechanisms.
The results of the Benchmark Test on the 100 X-instances by Uchoa et al. (2017) show that HGSRR is highly competitive. We investigated two configurations of HGSRR: HGSRR1 that focuses on average gap performance, and HGSRR2 that is the best configuration for the final gap metric. Both outcompete four state-of-the-art metaheuristics regarding final gap, including HGS and SISR, on all but the smallest X-instances. With a third configuration and a 16 hour time limit, HGSRR found new best known solutions to two X instances with 383 and 640 customers, respectively.
Hybrid genetic search and R&R based large neighborhood search are both among the best solution methods for vehicle routing and many other combinatorial optimization problems. Our investigations show that it is possible to design combinations of the two that are competitive and in fact improve performance relative to each of them. Our results should motivate further work in this direction.
The HGSRR metaheuristic is controlled by a set of several different parameters, as defined throughout Section 2. Because the HGSRR combines two metaheuristics, the parameters are divided into two tables. Table 8 shows all the parameters used in the genetic algorithm based on the HGS by Vidal (2020), and Table 9 presents  the parameters used to control the R&R search based on the SISR metaheuristic by Christiaens and Vanden Berghe (2020). The first four parameters in Table 9 are used to control the R&R operators themselves, and for sake of completeness, they are included in the Table despite not being defined in Section 2. For information about the first four parameters, we refer to the original paper. All the parameters and values are taken from the respective papers, except for the four last parameters in Table 9. Parameter T 0 takes another value than in Christiaens and Vanden Berghe (2020), while T E 0 , γ , and γ E are introduced in this paper.

Appendix B: Benchmark test results
This appendix presents all the results from the Benchmark Test presented in Section 3.6. The results are reported as the final solution value, denoted z F , and the average solution value, denoted z AV G . For definitions of the values, see Section 3.1.
Recall that for all the problem instances, ten independent runs were performed with each of the metaheuristics. Therefore, for each metaheuristic and problem instance, the value from the best run is reported, in addition to the average value from the ten runs. The best z F and the average z F from the ten runs can be found in Table 10, Table  11, and Table 12. Similarly, the best z AV G and the average z AV G from the ten runs can be found in Table 13, Table 14, and Table 15.