Introduction

The capacitated vehicle routing problem (CVRP) is a vehicle routing problem (VRP) in which the constraints lie mainly in the vehicle capacities and the maximum distance that each vehicle can travel (Zhang et al. 2008). The defined capacity of the vehicles makes CVRP the problem that it is, which arguably holds the closest similarity to the constraints that real life applications are under. The VRP does not take into consideration the capacity of the vehicles which play a huge role in the decision making when it comes to distribution.

CVRP is considered as a fundamental problem in the field of combinatorial optimization specifically in transportation and distribution logistics (Kara et al. 2007). It embodies a necessary basis of logistics planning (Chandran and Raghavan 2008) and evidently plays a critical part in wide-ranging practical applications. A solution to this particular routing problem will greatly benefit businesses and industries that are involved in transportation and distribution.

The cuckoo search (CS) is a metaheuristic algorithm proposed by Yang and Deb (2009) based on the obligate brood parasitic behavior of some cuckoo species. This algorithm is notably enhanced when combined with the Lévy flights (LF) rather than simple isotropic random walk method (Yang 2014). The LF is a behavior described by Reynolds and Frye (2007) as a way most animals and insects explore their landscape—by using a series of straight flight paths punctuated by a sudden 90° turn. The potential of the CS algorithm is recorded in various literatures (Vázquez 2011; Yang et al. 2012; Gandomi et al. 2013; Kaveh and Bakhshpoori 2013; Yildiz 2013).

For this paper, we explored the implementation of the CS algorithm with LF for solving the CVRP. We tested the algorithm on the problem instances from Augerat et al. (1995) and recorded its performance in terms of solution quality, with observation of the running times. The goal of this study was to add to the literature on the performance of the CS algorithm as a whole and on the performance of the CS algorithm as a solution for the CVRP. Additionally, it leads to other researches of the same nature in the future, as discovering an optimal solution to a routing problem such as CVRP is vital in real-life applications that involve transportation and distribution—one of the core points in satisfying demands of a consumer base.

The succeeding texts in this paper are organized as follows: the CVRP is formally presented and the approaches to solving it are reviewed in Sect. 2, the CS algorithm is formally presented in Sect. 3, the implementation of the CS algorithm to solve the CVRP is presented in Sect. 4, the experimental results are presented in Sect. 5, and the conclusion is made with recommendations in Sect. 6.

Problem

Capacitated vehicle routing problem

The CVRP is a situation in which a number of customers with individual demands are satisfied by a number of homogenous vehicles, each with a given capacity, from a central depot. The objective of the problem is to determine the set of optimal routes traveled by a number of identical vehicles with minimal travel costs (Toth and Vigo 2002). A route is feasible when it begins and ends at the central depot, with each customer serviced exactly once, and the total demand on any route does not exceed the vehicle’s capacity (Christiansen and Lysgaard 2007).

Formally, in the CVRP, a nonnegative demand \(q_{{u\left( {i,k} \right)}}\)—where \(u\) represents the route, \(i\) the customer, and \(k\) the vehicle—of a single commodity is to be delivered to \(n\) customers from a central depot using \(K\) independent delivery vehicles of identical capacity \(C\) (Ralphs et al. 2003; Kumar et al. 2014). Further, the goal is to complete the delivery with minimal distance \(D\) and least total cost, with \(d_{{u\left( {i,k} \right),u\left( {i + 1,k} \right)}}\) denoting the distance traveled for vehicle \(k\) from customer \(i\) to customer \(i + 1\), \(d_{k}\) the total distance traveled by a vehicle \(k\), and \(N_{k}\) the number of customers visited by vehicle \(k\). Distances between two customers are calculated using the Euclidean distance formula. Additionally, each route must begin and end at the depot, each customer is part of exactly one route, and the total demand of each route does not exceed vehicle capacity \(C\).

Kumar et al. (2014) translate the description to a mathematical representation:

$${\text{Min}}\, D = \mathop \sum \limits_{k = 1}^{K} d_{k} ,$$
(1)

where \(d_{k} = \mathop \sum \limits_{i = 0}^{{N_{k} }} d_{{u\left( {i,k} \right),u\left( {i + 1,k} \right)}} ,\)where \(d_{{u\left( {i,k} \right),u\left( {i + 1,k} \right)}}\) is calculated using the Euclidean distance formula

$${\text{Min }}K,$$
(2)
$$\mathop \sum \limits_{i = 1}^{{N_{k} }} q_{{u\left( {i,k} \right)}} \le C\quad\forall k = 1, 2, \ldots , K.$$
(3)

According to Kumar et al. (2014), Eqs. (1), (2), and (3) are for the objective function for minimization of total distance traveled by all vehicles, the objective for minimization of total number of vehicles used, and the vehicle capacity constraint, respectively.

Approaches to solving CVRP

A solution to the CVRP consists of a collection of \(K\) routes with minimum travel cost for \(K\) identical vehicles each with capacity \(C\), such that each route for each vehicle must begin and end at the depot, each customer is visited exactly by one route, and the sum of the demand of the customers visited by a route does not exceed the vehicle capacity \(C\) (Ralphs et al. 2003; Kumar et al. 2014). Since CVRP is a variation of the vehicle routing problem (VRP), which is an extension of the traveling salesman problem (TSP), the foundation of many exact approaches for CVRP is derived from the extensive and successful work done for the exact solution of TSP (Toth and Vigo 2014). CVRP has been extensively studied since the early 60s, and, in the past years, many new heuristic and exact approaches have been presented (Toth and Vigo 2002). However, despite the huge progress seen with respect to the first algorithms, such as the tree search method by Christofides and Eilon (1969), CVRP is still far from being satisfactorily solved (Toth and Vigo 2014).

Exact techniques for CVRP are applied only to small-scale problems and the qualities of constructive heuristics are often not satisfactory (Huang et al. 2008). CVRP problem instances that can be consistently solved by the most effective exact algorithms proposed so far can only accommodate up to 50 customers, while larger instances, which contain hundreds of customers, can only be solved in particular cases and can only be tackled with heuristic methods (Toth and Vigo 2002). Despite the lack of a conclusive approach, enormous improvements in the research community’s ability to solve these problems are improved due to better algorithms and computational abilities (Chandran and Raghavan 2008).

A study by Fukusawa et al. (2006) applied the Robust Branch and Cut Price algorithm for solving CVRP. Fukusawa et al. (2006) used four problems instances—P-n16-k8, P-n23-k8, P-n40-k5, and P-n101-k4—from the study of Augerat et al. (1995). The algorithm was able to produce a near-optimal solution for the four problem instances.

Ren (2012) applied genetic algorithm and was able to achieve desirable results with high efficiency and fast convergence rate. Shin and Han (2011) implemented a centroid-based heuristic algorithm which was able to achieve better results in many of the problem instances under sets A, B, and P of the Augerat et al. (1995) benchmark dataset as compared to the Sweep heuristic algorithm.

Lin et al. (2009) applied a hybrid algorithm of simulated annealing and tabu search and tested the algorithm on 14 classical problems and 20 large-scale benchmark instances. The hybrid algorithm achieved best solutions for 8 out of the 14 classical problems and exhibited competitive performance with other algorithms.

Mazzeo and Loiseau (2004) developed an Ant Colony algorithm (ACO) based on a metaheuristic technique introduced by Colorni et al. (1991). The developed algorithm achieved results which show very good performance in problem instances with up to 50 nodes and promising performance in bigger problem instances.

Kumar et al. (2014) reviewed other works on CVRP which showed promising performances. Kumar et al. (2014) also introduced a genetic algorithm with new fitness assignment procedure and tested the algorithm on standard benchmark problems. The algorithm achieved results which suggest its high competitiveness with other algorithms and its effectiveness for multi-objective optimization of vehicle routing problems.

Desired algorithm

Cuckoo search

Yang and Deb’s (2009) take on modeling the behavior implemented in the CS algorithm is as follows:

  1. 1.

    Each cuckoo lays one egg at a time and dumps its egg in a randomly chosen nest;

  2. 2.

    the best nests with high-quality eggs will carry over to the next generation; and

  3. 3.

    the number of available host nests is fixed, and the egg laid by a cuckoo is discovered by the host bird with a probability \(p_{\alpha } \in [0, 1]\). In this case, the host bird can either get rid of the egg, or simply abandon the nest and build a completely new nest.

In other words, there is always a probability that the cuckoo egg is discovered by the host. The effect of having an egg discovered and consequently being thrown away or abandoned is approximated by a fraction \(p_{\alpha }\) of the n host nests and are replaced by new nests (with new random solutions).

These three items that model the behavior of cuckoos provide a selection process for the CS algorithm, mimicking a “survival of the fittest” characteristic because it ensures that the best eggs survive from generation to generation (Yang 2010).

Furthermore, the algorithm’s strength comes from the utilization of the LF pattern—especially when the global random walk is carried out (Yang 2014). Based on the aforementioned modeled behavior, a method of generating the eggs is required and will be applied with the LF.

Essentially, there are three components in this algorithm: selection of the best, exploitation by local random walk, and exploration by randomization via LF globally (Yang and Deb 2010). To control the step size (the random pattern) of the LF in generating solutions, a user-specified coefficient \(\alpha\) is defined. Yang and Deb (2009) stated that in most cases, we can define \(\alpha = 1\). When a Lévy step is generated using a random number generator, it is first multiplied by \(\alpha\) before it is used to generate a new egg. The cuckoo laying eggs corresponds to a new solution set and this is analogous to generating new solutions for a cuckoo \(i\).

Furthermore, the CS algorithm has only two parameters to be adjusted (Yang and Deb 2009):

  1. 1.

    The LF step size coefficient \(\alpha\), which controls the scale of the flight by multiplication and should be related to the size of the solution space of the objective function; and

  2. 2.

    The fraction \(p_{\alpha }\) of eggs to be discarded. This value dictates how much exploration the algorithm will execute. If increasing, chances of getting trapped in a local minima is decreased. If decreasing, chances of getting trapped in a local minima is increased. This can also be interpreted by understanding that once \(n\), which is the number of host nests, is fixed, \(p_{\alpha }\) essentially controls the elitism and the balance of the randomization and local search.

Yang and Deb (2009) discovered in their own validation and testing of the CS algorithm that the convergence rate, to some extent, is not affected by the parameter \(p_{\alpha }\), which means that there is no need to fine-tune this parameter for a specific problem. In their testing of their algorithm, where \(n = 15, \;16, \; \ldots , \;50\) and \(\alpha = 1\), they found that \(n = 15\) and \(p_{\alpha } = 0.25\) are sufficient for most optimization problems.

These two parameters represent only a small number of user-specified parameters compared to the other optimization algorithms of this type (Yang et al. 2013). It is found that the less parameters to be manipulated in an algorithm, the more generic it will be. Complexities, such as parameters overtly affecting results, will be avoided. The algorithm will be able to perform its task without being constrained and affected by too many parameters.

Yang (2010) simplifies the general implementation of the CS algorithm by using the following simple representation: each cuckoo egg in a host nest represents a solution and each cuckoo can only lay one egg which represents one solution. Each egg then carries two pieces of information: its coordinates in the solution space and its fitness value. With this information, the new egg/solution is to be evaluated. If the new egg/solution is significantly better or has more potential, it is to replace the previous solution in the nest which is now inferior in comparison to the new one. A situation where the nests can contain more than one egg which would represent a set of solutions can be achieved by the algorithm.

Like other metaheuristic algorithms, the CS execution is dependent on a stopping criterion. In Bacanin’s (2011) object oriented software implementation of a novel version of the CS algorithm, the CS algorithm has the stopping criterion \({\text{maxGeneration}} = 500\), thus having 500 cycles per run. Bacanin (2011) states that the results of his implementation based on the four utilized benchmarks for performance evaluation are of optimal value and for a reasonable threshold \(10^{ - 15}\), the results are perfect. This coincides with Yang and Deb’s (2009) review of their algorithm, where they run the algorithm at least 100 times and where each run stops when the variations of the function values are less than the given tolerance \(\varepsilon \le 10^{ - 5}\).

Additionally, Yang and Deb’s (2009) review of the CS algorithm includes comparison of the performance of the genetic algorithm (GA) and particle swarm optimization (PSO). While both have performances ranging from 77 to 100%, CS aces all ten standard optimization benchmarks, where each algorithm has been run at least 100 times so as to carry out meaningful statistical analysis. Yang and Deb (2009) added that the primary reasons for such performance of the CS algorithm are the fine balance of randomization and intensification and the fact that fewer parameters are to be fine-controlled, which makes it evidently better and efficient for multimodal objective functions. These results emphasize the potential of CS, with Yang (2010) himself stating that CS is potentially far more efficient than other algorithms of similar goal.

With the various research findings, CS has been found to be more generic and robust for many optimization problems in comparison with other metaheuristic algorithms. This is not to say that CS cannot be hybridized with other mentioned algorithms, which has been done by other researchers, namely Kundra and Sadawarti (2015), and can produce even better outcomes.

Lévy flights

The foraging path of any animal is effectively a random walk, as the next move is based on the current location and the transition probability to the next location (Melin et al. 2015). Such randomization can be carried out in three ways: uniform randomization, random walks, and heavy-tailed walks (Yang 2010). The LF is a foraging pattern under the heavy-tailed walks and is a flight strategy exhibited by many organisms like fruit flies or Drosophila melanogaster.

LF essentially provides a random walk whose random step length is drawn from a Lévy distribution:

$${\text{L\'{e}vy}} \sim u = t - \lambda , (1 < \lambda \le 3),$$
(4)

which has an infinite variance with an infinite mean (Yang 2010). This randomization plays an important role in both exploration and exploitation in metaheuristic algorithms such as the CS (Kaveh and Bakhshpoori 2013).

The LF pattern can also be described by many relatively short steps (corresponding to the detection range of the searcher) that are separated by occasional longer jumps (Noah et al. 2013). Another description of the LF pattern is an intensified search around a solution, followed by big steps in the long run (Ouaarab et al. 2014). Consequently, this is what is called a Lévy-flight-style intermittent scale-free search pattern (Roy and Chaudhuri 2013). A scale-free search pattern means that regardless of the scale the searching pattern will not differ and will present the same fractal patterns regardless of the range over which they are viewed (Noah et al. 2013). In terms of searching in the solution space, small-scale searches occur locally while large-scale searches occur globally—leading to an automatic balance between exploration and refinement (Yang et al. 2013). This means that when LF is generating new solutions, the search will mostly stay around the best solution obtained so far, which speeds up the local search. However, to avoid from being trapped in a local optimum, or in other words be stuck with a solution which is only best in a small area of the solution space, some of the new solutions will also be generated by a far field randomization whose locations are decidedly far enough from the current best solution (Yang and Deb 2009). Thus, LF holds a crucial role in controlling the balance between intensification and diversification (Ouaarab et al. 2014). It is the exponential property of LF that gives it a scale invariant property (Roy and Chaudhuri 2013). Yang and Deb (2009) also state that in most optimization problems, the search for a new best solution is made more efficient by LF. An example of an LF pattern is shown in Fig. 1.

Fig. 1
figure 1

a An example of a 100-step LF and b a zoomed-in section of the same LF (lifted from Yang et al. 2013)

In the CS algorithm, the selection of the best by keeping the best nests or solutions is equivalent to some form of elitism commonly used in genetic algorithms (Yang and Deb 2010). This elitism secures the best solution’s position in the population by constantly passing it to the next generation with no risk of it being eliminated. The exploitation around the best solutions is performed by using a local random walk (Yang and Deb 2010):

$$x^{t + 1} = x^{t} + \alpha \varepsilon_{t} ,$$
(5)

\(x^{t + 1}\) is a new solution generated using LF, \(x^{t}\) is the current best solution where the new solution is derived, \(\alpha\) is the mentioned step size parameter, and if \(\varepsilon_{t}\) follows a Gaussian distribution, then this becomes a standard random walk. If \(\varepsilon_{t}\) is drawn from a Lévy distribution, the step of move is larger, and could potentially be more efficient (Yang and Deb 2010). There is possibility of the step being too large, and, therefore, there is risk that the move is too far away. Fortunately, the elitism mentioned keeps the exploitation moves within the neighborhood of the best solutions locally by keeping the best solutions of each iteration.

In Kaveh and Bakhshpoori’s (2013) study, a more defined version of Eq. (5) is presented where instead of the \(\varepsilon_{t}\) representation for a random walk, S is a parameter that represents the length of random walk with LF according to Mantegna’s algorithm:

$$x^{t + 1} = x^{t} + \alpha \cdot S.$$
(6)

A random walk is a process which consists of taking a series of consecutive random steps. It can be expressed as

$$S_{n} = \mathop \sum \limits_{i = 1}^{n} X_{i} = X_{1} + X_{2} + \cdots + X_{n} = \mathop \sum \limits_{i = 1}^{n - 1} X_{i} + X_{n} = S_{n - 1} + X_{n} ,$$
(7)

where \(S_{n}\) represents the random walk with n random steps and \(X_{i}\) represents the ith random step with predefined length. The step size or length can vary according to distribution, and in this study’s case it will follow the Lévy distribution.

In terms of implementation (Kaveh and Bakhshpoori 2013), the generation of numbers with LF consists of two steps: the choice of a random direction and the generation of steps, which obey the chosen Lévy distribution. The generation of steps can be quite tricky, but there are a few ways to achieve it. One of the most efficient and straightforward ways is to apply the Mantegna algorithm. In Mantegna’s algorithm (Mantegna 1994), the step length S can be calculated by

$$S = \frac{u}{{|v|^{1/\beta } }},$$
(8)

where β is a parameter between [1, 2] interval and considered to be 1.5; variables u and v are drawn from normal distribution as

$$u\sim N(0, \sigma_{u}^{2} ),\;\;v\sim N(0, \sigma_{u}^{2} ),$$
(9)

where

$$\sigma_{u} = \left\{ {\frac{{\varGamma \left( {1 + \beta } \right) { \sin }\left( {\frac{\pi \beta }{2}} \right)}}{{\varGamma \left[ {(1 + \beta )/2} \right] \beta 2^{(\beta - 1)/2} }}} \right\}^{1/\beta } ,\;\sigma_{v} = 1.$$
(10)

Despite its optimal searching pattern, however, LF does not come without its drawbacks; because LF also has a random nature, it cannot always be guaranteed (Yang 2010). However, LF is one of the most powerful features of CS (Yang et al. 2013). Accordingly, LF is core in making the CS algorithm reach its current potential.

LF is not an exclusive search pattern for CS. Another application of LF can be seen in Pavlyukevich’s (2007) study on non-local search and simulated annealing. LF can also be seen on studies on human behavior foraging patterns, and even light can be related to LF (Yang 2010).

Fitness function

A fitness function measures the potential of each solution. The fitness function specific for CVRP gathered from Kumar et al. (2014) is presented as follows:

$$F(D)_{i} = \frac{{(D)_{ \hbox{max} } - (D)_{i} }}{{(D)_{ \hbox{max} } - (D)_{ \hbox{min} } }}\quad\forall i = 1, 2, \ldots , S,$$
(11)

where \(F(D)_{i}\) is the fitness function value of distance traveled for ith solution in a population, \((D)_{ \hbox{max} }\) is the maximum distance traveled in a population, \((D )_{ \hbox{min} }\) is the minimum distance traveled in a population, \((D)_{i}\) is the distance traveled for the ith solution in a population, and \(S\) is the size of the population.

\(F(D)\) is calculated for all solutions in a population. Since it is a minimization problem, a solution with high fitness function value is an optimal solution. Other forms of fitness can be defined in a similar way to the fitness function in genetic algorithms (Yang and Deb 2009).

Cuckoo search applications and other modifications

The CS algorithm has seen itself being applied to a variety of problems other than combinatorial optimization problems. These include structural optimization problems which are highly nonlinear and involve large numbers of design variables with complex constraints (Gandomi et al. 2013), business optimization applications (Yang et al. 2012), optimal machining parameters in milling operations (Yildiz 2013), and even optimum design of steel frames (Kaveh and Bakhshpoori 2013). The CS algorithm is also seen in the field of machine learning, where it is used in the study on the training of spiking neural networks (Vázquez 2011).

All mentioned studies have pointed out the significantly better performance of the CS algorithm due to fewer parameters compared to other algorithms. For that reason, the mentioned studies also acknowledge the overall potential of the CS algorithm. Furthermore, despite being relatively new in terms of years in the research world in comparison to other commonly used optimization algorithms (CS was introduced in 2009, PSO in 1995, GA introduced in the 1970’s), the CS algorithm is no stranger to being modified or improved upon: a modified CS, with a new gradient-free optimization algorithm, has been developed and involves the addition of information exchange between the top eggs, or the best solutions (Walton et al. 2011). There is also the multi-objective CS algorithm for design optimization by Yang and Deb (2013) and even an improved CS algorithm for feed-forward neural network training by Valian et al. (2011).

Methodology

Problem instances

All the problem sets (A, B, and P) from the benchmark dataset of Augerat et al. (1995) were utilized for this study. Further, only those problem instances with at least four vehicles were considered due to the restriction (see Sect. 4.3.4) in the use of an operation for the algorithm.

Problem representation

Toth and Vigo (2002) defined CVRP by the following graph theoretic problem. Let \(G = \left( {V, A} \right)\) be a complete undirected graph where \(V = \{ 0, \ldots , n\}\) is the vertex set with a corresponding \(Q\) demand set, and \(A\) is the arc set of undirected edges. Vertices \(j = \{ 1, \ldots , n\}\) correspond to the customers where each vertex has a known nonnegative demand \(q_{n}\) to be delivered, whereas vertex \(\{ 0\}\) corresponds to the depot with a demand \(q_{0} = 0\). Given a customer set \(S \subseteq V\), let \(d\left( S \right) = \mathop \sum \nolimits q_{jj} \in S\) denote the total demand of a customer set. To illustrate \(V\), a \(2 x m\) matrix is shown in Eq. (12) where each column represents the coordinates of each vertex or customer. Eq. (13) denotes \(Q\), the demand set, where each element corresponds to one vertex or customer:

$$V = \left( {\begin{array}{*{20}c} \begin{aligned} x_{0} \hfill \\ y_{0} \hfill \\ \end{aligned} & \begin{aligned} x_{0} \hfill \\ y_{1} \hfill \\ \end{aligned} & \begin{aligned} \ldots \hfill \\ \ldots \hfill \\ \end{aligned} & \begin{aligned} x_{m} \hfill \\ y_{m} \hfill \\ \end{aligned} \\ \end{array} } \right),$$
(12)
$$Q = \left( {q_{0} , q_{1} , \ldots , q_{n} } \right).$$
(13)

Furthermore, a nonnegative distance, \(d_{ij}\), is associated with each arc \(\left( {i, j} \right) \in A\) and represents the travel cost spent to go from vertex \(i\) to vertex \(j\). Since \(d_{ij} = d_{ji}\) for all \(i\), \(j \in V\), this makes for a symmetric CVRP (SCVRP). Therefore, there are no loop arcs \((i, i)\). The arc set \(A\) is shown in Eq. (14) and is composed of the set of edges of the graph expressed as \(a_{ij}\). The associated cost \(d_{ij}\) for all the arcs \(\left( {i, j} \right) \in A\) is defined as the Euclidean distance between the two points corresponding to vertices \(i\) and \(j\), where the distance is calculated using the Euclidean distance formula.

$$A = \left( {\begin{array}{*{20}c} {a_{10} } & {a_{21} } & \ldots & {a_{1j} } \\ {a_{20} } & {a_{31} } & \ldots & {a_{2j} } \\ \vdots & \vdots & {} & \vdots \\ {a_{i0} } & {a_{i1} } & \ldots & {a_{ij} } \\ \end{array} } \right)$$
(14)

The graph \(G\) includes the arcs connecting all vertex pairs, with the exception of loops. The problem is interpreted under the assumption that all the nodes presented, including the depot, are fully interconnected in a complete graph.

CS algorithm on CVRP

The flowchart (Fig. 2) and the parameters (Table 1) are presented below. For the parameters, the values assigned to \(n\) and \(p_{\alpha }\) were based from Yang and Deb (2009) while the value for the maxGeneration was based from Bacanin (2011).

Fig. 2
figure 2

CS algorithm flow based from the pseudocode presented by Yang and Deb (2009)

Table 1 CS algorithm parameters

Flow of the algorithm

The initial 15 solutions are generated using the information from the problem instance. A solution is then randomly chosen from the set of initial solutions for a comparison. The second solution for comparison is also chosen from the set of initial solutions, but undergoes improvement determined by the LF value and the 2-opt and double-bridge operations. The LF value is determined using Eqs. (8), (9), and (10).

The LF value dictates the number of times that either 2-opt or double-bridge operation is applied. The algorithm then proceeds to either apply the operation for the remainder of the iteration or proceed to the comparison of the first randomly chosen solution and the second LF improved solution. During the comparison, the better solution is kept. If the better solution happens to be the LF solution, it will take the place of the randomly chosen solution. The fitness of all the solutions is recalculated, the solutions are ranked, and the three (based from the algorithm parameters) worst nests or solutions are removed and are replaced by newly generated solutions. After maintaining the number of solutions by generating new solutions, the fitness values of the solutions are again calculated and ranked.

The whole process constitutes a single iteration. After completing 500 iterations, the best solution is recorded to mark the end of a single run.

Generation of the initial solutions

A randomization pattern is applied to generate an initial solution for the population. A vehicle is randomly chosen, with the constraint that it should be able to cater to the currently unassigned customer with the least demand. If not, the program loops until a vehicle with adequate capacity is selected.

Next, an unassigned customer is randomly chosen and is added on the vehicle’s route. If the customer–vehicle pairing does not meet the problem constraints, the program moves on to the next possible pairing. Otherwise, the function removes the customer from the pool of unassigned customers and adds it to the vehicle route, updates the vehicle capacity, and moves on to the rest of the unassigned customers and vehicles.

A sample solution \(S\) representation for the CVRP by the algorithm implementation is shown in Fig. 3. This entire procedure is repeated \(n\) (number of host nests) times.

Fig. 3
figure 3

Sample solution representation of a problem instance with 16 customers and 8 vehicles

When the initial solutions are obtained, the solutions’ fitness values are calculated and are consequently ranked.

2-opt operation

2-opt is a popular simple local search operation (Chang 2015) which was first introduced by Croes (1958) for a single objective TSP. Its characteristics include the following: it is applicable to both symmetric and asymmetric problems with random elements; it does not use subjective decisions, so it can be completely mechanized; it is appreciably faster than any other method proposed; and it can be terminated at any point where the solution obtained so far is deemed sufficiently accurate.

In the TSP, 2-opt improves a random initial tour by exchanging two of the edges in the tour with two other possible edges. For example, the operation will select two edges \(\left( {u_{1} , u_{2} } \right)\) and \(\left( {v_{1} , v_{2} } \right)\) from the tour—where \(u_{1}\), \(u_{2}\), \(v_{1}\), and \(v_{2}\) are distinct and appear in this order in the tour—and will replace these edges with the edges \(\left( {u_{1} , v_{1} } \right)\) and \(\left( {u_{2} , v_{2} } \right)\), provided that this change will decrease the length of the tour (Englert et al. 2014). The operation is repeated until no more improvements can be made.

In the capacitated vehicle routing problem, an attempt to improve the current solution is done by swapping two customer positions—two individual routes are chosen where two customers are singled out and swapped. This consequently affects the solution quality.

Double-bridge operation

The double-bridge is a mutation operator for the Genetic Algorithm. It involves the exchange of four edges in a specific pattern (Handl et al. 2016). It allows large-scale changes in a tour to take place and it is a move that cannot be built from the composition of a local sequence of 2- and 3-changes (Martin et al. 1991).

An example of the specific pattern of the exchange of four edges in the traveling salesman problem is as follows: the edges \(\left( {a, b} \right)\), \(\left( {c, d} \right)\), \(\left( {e, f} \right)\), and \(\left( {g, h} \right)\) in the tour are replaced by the edges \(\left( {a, f} \right)\), \(\left( {c, h} \right)\), \(\left( {e, b} \right)\), and \(\left( {g, d} \right)\) (Ouaarab et al. 2014).

The double-bridge operation only differs to the 2-opt operation in that instead of swapping two customers, it swaps four customers. Due to the nature of this operation, each problem instance used must have at least four vehicles.

Experimental results

This section presents the results of the computational experiments carried out to determine the performance of the CS algorithm applied to the CVRP. The algorithm was coded using the Java programming language and was run mainly on a 1.9-GHz system with 4 Gb of RAM.

Comparison between the known best solutions from Augerat et al. (1995) and the best solutions obtained by the CS algorithm applied in this study for each of the selected problem instances from the Augerat et al. (1995) benchmark dataset is shown in Table 2 (set A problem instances), Table 3 (set B problem instances), and Table 4 (set P problem instances).

Table 2 The known best solution and the obtained best solution for problem instances under set A of the Augerat et al. (1995) benchmark dataset
Table 3 The known best solution and the obtained best solution for problem instances under set B of the Augerat et al. (1995) benchmark dataset
Table 4 The known best solution and the obtained best solution for problem instances under set P of the Augerat et al. (1995) benchmark dataset

The bold values in Tables 3 and 4 indicate the solutions obtained by the applied algorithm that match or are close to the known best solution from the literature. Out of these closest solutions, one (P-n16-k8) achieved the same set of routes as the one in the literature and would have achieved the same length as the one in the literature if not for the difference in how the values are computed.

Most of the solutions are far off as compared to the ones from the literature. Hence, it is reported that this study’s implementation of the CS algorithm is not effective. The results obtained for these problem instances are significantly larger than the best known solutions. The large difference could be attributed to several factors, such as the parameter settings (e.g. number of iterations), the interpretation and application of the Lévy Flights, the operations used (2-opt and double-bridge), or merely the nature of random walks and metaheuristics (Yang and Deb 2010).

For consideration of the thought, we tested increasing the number of iterations for every run to check if it has any effect on the solution quality generated. Increasing the number of iterations to 10,000–30,000 makes it possible for the CS algorithm to obtain better solution lengths as compared to only 500 iterations for both the large problem instances P-n40-k5 and P-n101-k4. At 30,000 iterations, the best solution length obtained for P-n40-k5 was 581.15897, with a relative error of 0.2689 when compared to the best known solution. As for P-n101-k5 at the same 30,000 iterations, the best solution length obtained was 1044.22480, with a relative error of 0.5088 when compared to the best known solution. This gives us an idea that the convergence to the optimal solution of CS algorithm may be significantly slower for the larger problem instances. The other factors are left for future studies.

Conclusion

The CS algorithm’s application in this study for the CVRP was not effective in achieving desirable results for the problem instances, most notably for the large ones, from the Augerat et al. (1995) benchmark dataset, except for a select few which are composed of small problem instances and whose values either match or are close to the ones from the literature. Such results may be attributed to a number of factors. With this thought, we explored on changing the execution setup—the number of iterations in each run in particular—of the program. The results show that high numbers of iterations produce significantly better results than the set number of iterations. This gives an idea that the convergence of CS algorithm to the optimal solution may be significantly slower (or proportional to the size of the problem instance).

Future works include an in-depth look into possible modifications that can be done to the algorithm. Additionally, instead of applying 2-opt and double-bridge for a fixed number of applications, another option is to let the operators attempt to improve the solution using a tolerance number as control. Considering other operations for solution improvement is also an option. Minimal parameters, a strong search pattern, a form of elitism by removing worst nests, all of these contribute to the excellent performance of the CS algorithm in various studies. Even so, aside from Ouaarab et al.’s (2014) application of an improved and discrete version of the CS algorithm on the Traveling Salesman Problem, there is a lack of literature of the performance of the CS algorithm on routing problems; hence the existence of this study.