1 Introduction

Route scheduling is one of the most important real-life problems and plays a pivotal role in transportation, supply chain management and logistics. Its practical applications encompass the bus route planning, post, food and beverage delivery, cash delivery to banks and ATM terminals, industrial waste collection, maintenance operations, and many more. While constructing the routing schedule for a given distribution problem, it is necessary to consider a large number of practical issues, e.g., the available fleet size, truck capacities, travel costs between geographically dispersed customers, possible time intervals in which customers should be visited, and numerous other circumstances.

Minimizing the number of trucks and their total distance traveled during the service contributes to reducing the fleet exploitation costs and fuel consumption. Also, it lessens the price of delivered goods, since transportation expenses constitute a significant percentage of their value [26]. Moreover, it can help reduce the environmental pollution and traffic congestion, which is an important concern nowadays [29]. Numerous variants of vehicle routing problems (VRPs) emerged in order to reflect real-life scheduling scenarios [20]. In the multiple traveling salesperson problem (\(m\)TSP), which is an extension of a standard traveling salesperson problem, more than one salesperson is allowed to be exploited in the solution [7]. Here, each customer has assigned a non-negative demand, which should be satisfied by the visiting salesperson. The fleet size (corresponding to the number of salesmen in \(m\)TSP) becomes an important criterion of optimization, apart from the total travel costs. Taking into account the limited capacities of the vehicles led to formulating the capacitated vehicle routing problem (CVRP). In CVRP, the fleet is composed of a number of vehicles with well-defined, possibly different load capacities, which cannot be exceeded in a feasible solution. In real-life transportation problems, it is common that customers want to get their orders within a specified time interval. The vehicle routing problem with time windows (VRPTW) addresses this issue by incorporating additional constraints concerning delivery time [33].

State-of-the-art techniques to solve the VRPTW are divided into exact and approximate methods. Since the VRPTW is NP-hard, the former approaches can be applied only for relatively small problem instances. Hence, various heuristic algorithms (both sequential and parallel), that do not guarantee obtaining the optimal solution, have been introduced over the years for solving the VRPTW in a reasonable time. They include, among others, simulated annealing [69], tabu search algorithms [28], ant colony systems [26], swarm optimization algorithms [30], evolutionary approaches [57], and many more. Recently, a number of efficient genetic and memetic algorithms—both sequential and parallel—have been proposed for the VRPTW [25, 46, 50, 66].

Memetic algorithms (MAs), which are built upon a population-based approach, and balance exploration with exploitation of the search space, have attracted research attention, and have been applied for solving the VRPTW recently [46, 50, 66]. They were shown to be very efficient in solving large-scale VRPTW instances. In order to speed up the computations, and to explore larger regions of the search space, we had proposed and recently improved an island-model parallel memetic algorithm for the VRPTW [8, 12, 4749]. Although we demonstrated its efficacy, its main drawback lies in selecting a proper co-operation scheme of parallel processes (i.e., migration topology, interval, the immigration/emigration policy, and the number of emigrants). In this paper, we carefully investigate the proposed co-operation schemes to determine their impact on the parallel MA performance. Our intensive experimental study conducted on the standard benchmark Gehring and Homberger’s (GH) set (with 1000 customers) gave detailed insights into the schemes’ performance and behavior. We report 19 new world’s best solutions to the GH tests obtained with the use of best co-operation schemes in our parallel MA.

1.1 Contribution

Although parallel EAs (PEAs) proved to be very efficient in solving a variety of optimization problems in many fields, establishing a proper co-operation scheme of parallel processes which ensure good convergence of the algorithm is not a trivial task. Improperly determined migration topology and interval can easily jeopardize the algorithm performance and computation time. In this paper, we propose to apply co-operation schemes which can help improve both exploration and exploitation capabilities of our parallel memetic algorithm (PMA-VRPTW) to solve the largest-scale VRPTW benchmark problem tests (with 1000 customers to serve). We expand our previous research [47], in which we determined three most promising migration topologies for PMA-VRPTW (based on results obtained for 400-customer tests). Here, we apply these schemes in PMA-VRPTW, and perform the in-depth analysis to find out how they influence the algorithm behavior. Also, we incorporate various migration intervals, and investigate their impact on the PMA-VRPTW convergence. We report new world’s best solutions for 19 (out of 60) 1000-customer Gehring and Homberger’s tests obtained in our extensive experimental study. Finally, we give clear guidelines on how to select the best co-operation scheme in PMA-VRPTW based on the test characteristics.

1.2 Paper Outline

The outline of this paper is as follows. The problem is formally defined in Sect. 2. Section 3 reviews the state-of-the-art algorithms for the VRPTW, and shows current advances in PEAs. Section 4 discusses in detail PMA-VRPTW, along with the co-operation schemes. The results of an extensive experimental study performed on the standard Gehring and Homberger’s benchmark set are reported and analyzed in Sect. 5. Finally, Sect. 6 concludes the paper, summarizes the findings, and highlights directions of our future work.

2 Problem Formulation

The VRPTW is a problem of serving \(M\) customers by \(K\) vehicles of a constant capacity \(Q\). There is a single depot (\(v_{0}\)), which is the start and the finish point of each route. The customers \(v_{i}\), \(i \in \{1,2,\ldots ,M\}\), have assigned their own service times \(s_{i}\), \(i \in \{1,2,\ldots ,M\}\). Serving the depot does not take any time (\(s_0=0\)), whereas the customer service times are non-negative. A non-negative demand \(d_{i}\), \(i \in \{1,2,\ldots ,M\}\), is given for each customer. A geographic dispersion of customers is known, and the travel costs between each pair of travel points are given as \(c_{(i,j)}\), where \(i \ne j\), and \(i,j \in \{0,1,\ldots ,M\}\). Additionally, each customer and the depot specifies its earliest and latest time of starting the service (i.e., time window), \(e_{i}\) and \(l_{i}\) respectively (\(i \in \{0,1,\ldots ,M\}\)).

Formally, the VRPTW is defined on a directed graph \(G=(V,E)\) with a set \(V\) of \(M+1\) vertices representing the customers and the depot, along with edges \(E=\{\langle v_{i},v_{(i+1)}\rangle |v_{i},v_{(i+1)}\in V, v_{i} \ne v_{(i+1)}\}\), representing the travel connections. The \(i\)th route is defined as an ordered list of \(m_i\) customers served by a single vehicle: \(r_i=\langle v_{0},v_{(r_i(1))},\ldots ,v_{(m_i+1)}\rangle \), where \(v_{0} = v_{(m_i+1)}\) is the depot, and \(v_{(r_i(j))}\) is the \(j\)th customer visited within \(r_i\).

2.1 Objectives and Constraints

The primary objective is to minimize \(K\) (\(K\ge \left\lceil D/Q\right\rceil \), where \(D=\sum _{i=1}^M{d_i}\)). Secondly, the traveled distance \(T\) is to be minimized, where \(T\) is given as

$$\begin{aligned} T=\sum \limits _{i=0}^M { \sum \limits _{j=0}^M { \sum \limits _{k=1}^K {x_{(i,j,k)}c_{(i,j)}}}}. \end{aligned}$$
(1)

If the \(k\)th vehicle travels from \(v_i\) to \(v_j\), then \(x_{(i,j,k)}=1\) (\(0\) otherwise).

Every customer is visited exactly once, and each route starts and finishes at \(v_0\). The capacity and the time window constraints must hold for each route. The total amount of goods delivered to customers cannot therefore exceed \(Q\), and the service of each customer must start before its time window closes. If a vehicle visits \(v_{(i+1)}\) within its time window (\(e_{(i+1)}\le a_{(i+1)}\le l_{(i+1)}\), where \(a_{(i+1)}\) is the arrival time at \(v_{(i+1)}\)), then the service is immediate. The vehicle may arrive at \(v_{(i+1)}\) before \(e_{(i+1)}\), but the service cannot initiate before the time window opens (there is the waiting time \(w_{(i+1)}\)). If a vehicle visits \(v_{(i+1)}\) after closing its time window (\(a_{(i+1)}>l_{(i+1)}\)), then the service is not feasible.

Let \(\sigma _A\) and \(\sigma _B\) be two VRPTW solutions. Then, \(\sigma _A\) is of a higher quality than \(\sigma _B\), if (\(K(\sigma _A) < K(\sigma _B)\)), or (\(K(\sigma _A) = K(\sigma _B)\) and \(T(\sigma _A) < T(\sigma _B)\)).

2.2 Example

An exemplary solution \(\sigma \) of the VRPTW instance containing \(25\) customers is presented in Fig. 1. This solution consists of three routes (\(r_1\), \(r_2\), and \(r_3\)): \(r_1=\langle v_0,v_8, v_{10},v_{12},v_{21},v_{22},v_{23},v_{24},v_{25},v_{17},v_{14},v_{0}\rangle \) (\(10\) customers are visited), \(r_2=\langle v_0,v_{11},v_{15},v_{19},v_{20},v_{18},v_{16},v_{13},v_{9},v_{7},v_{0}\rangle \) (\(9\) customers), and \(r_3=\langle v_0,v_6,v_2,v_1,v_4,v_3,v_5,v_{0}\rangle \) (\(6\) customers). It is easy to see that each customer \(v_i\), \(i \in \{1,\ldots ,25\}\), is served exactly once. Assuming that the vehicle loads do not exceed the maximum capacity in any route, and the time window constraints are not violated, this routing schedule \(\sigma \) is feasible.

Fig. 1
figure 1

An exemplary solution to the VRPTW instance with 25 clients served in three routes

3 Related Literature

3.1 Vehicle Routing Problem with Time Windows

Due to its wide practical applicability, the VRPTW attracted research attention over the years. Although the optimal solutions can be obtained using exact algorithms, their computation time is not acceptable for large-scale problem instances. Thus, a plethora of heuristic algorithms have been proposed, which find high-quality (but not necessarily optimal) solutions in a reasonable time.

Exact algorithms for the VRPTW inherit from works devoted to solving the TSP [33]. In a majority of them, minimizing the travel distance is considered as the single objective. Desrochers et al. proposed to formulate the VRPTW as a set partitioning problem [21], where all feasible routes are considered implicitly. Since the number of possible routes rapidly grows for an increasing number of customers, only a subset of all routes is included in the model. Then, a relaxed shortest path problem is solved to verify if there are feasible routes which decrease the total travel distance. A branch-and-cut procedure for minimizing the number of vehicles was presented in [6]. Irnich and Villeneuve adopted the elementary shortest path problem with resource constraints for the VRPTW [31]. Various VRPTW formulations, including the path, arc, arc-node, and spanning tree formulations, have been studied by several authors [3, 15, 23, 36, 37, 58]. Exact methods were summarized in a bunch of thorough surveys and reviews [4, 19, 22, 33].

Although exact algorithms for solving the VRPTW are still being developed, they are not applicable for large-scale problem instances. Also, they are strongly dependent on the time window characteristics of a specific test case [66]. In contrary to exact methods, in heuristic techniques two VRPTW objectives are usually considered independently. Two-stage techniques (both sequential and parallel), in which \(K\) is minimized at first, and then \(T\) is optimized, are of a high research interest. They enable designing effective algorithms for both optimization stages independently.

Heuristic methods can be grouped into construction and improvement techniques. In the former case, unserved customers are iteratively inserted into a partial solution [5355, 62, 64]. Alternatively, improvement heuristics modify an initial solution to explore new regions of the solution space, e.g., by applying some local search procedures [13, 45]. Meta-heuristic algorithms, which often embed methods for exploring the search space coupled with local search improvement techniques used for its exploitation, allow for existing infeasible solutions and deteriorating their quality temporarily. These approaches include simulated annealing [16], tabu searches [28], ant colony optimization [18], swarm optimization algorithms [30], and many other techniques [5, 13].

EAs have been extensively explored for the VRPTW [57]. In genetic algorithms (GAs), solutions (chromosomes) are successively optimized in the biologically-inspired manner. Chromosomes are selected, crossed-over, and mutated. Memetic algorithms (MAs)—also known as hybrid genetic algorithms—are built upon a similar approach. They combine EAs for the search space exploration, along with refinement procedures applied to exploit solutions already found [46, 50]. Vidal et al. recently proposed an efficient hybrid genetic algorithm which evolves both feasible and infeasible solutions [66]. It is worth mentioning that MAs have been applied to a wide spectrum of other optimization and pattern recognition problems [27, 32, 38, 39, 41, 51].

3.2 Parallel Evolutionary Algorithms

Parallel evolutionary algorithms (PEAs) became a vital research topic thanks to their applicability and availability of various parallel architectures and services, ranging from multicore personal computers, massively parallel graphics processing units, to clusters and computation clouds. The early days of parallel EAs have been summarized in thorough surveys and reviews [2, 14]. The implementation issues of PEAs along with their computation models were discussed in an excellent survey by Alba and Tomassini [1].

There exist a number of parallel computation models defining how distributed algorithm components co-operate during the PEA execution [63]. In the simplest PEAs, parallel algorithm components run independently without any communication. Finally, the results obtained during the execution are collected to determine the highest-quality solution. This model can be considered as a batch model, in which sequential executions of an EA are batched and executed on a parallel machine to speed up the computation. In master-slave models, a single machine in the system is promoted to act as the master, which distributes the workload to other working machines.

Communication of parallel processes is crucial to guide the search during the PEA execution. Since some processes may have already reached the promising regions of the search space, it is beneficial to distribute this knowledge among other processes which could have stuck in local minima. In the most popular parallel models—island models (also referred to as distributed EAs)—each process (an island) optimizes its population independently from other islands. If the islands run the same EA, then the system is homogeneous (it is heterogeneous otherwise). The solutions already found in the system are periodically exchanged between the islands. This co-operation is defined by the migration topology and interval (what is the communication path and when the co-operation is performed), immigration/emigration policy (what happens with the obtained/sent solutions), and the number of solutions (migrants) sent between each two islands. The choice of each design setting from the above-mentioned ones is not trivial and significantly affects the PEA performance and behavior. There exist variants of an island model, in which every island contains a single individual in the population, and the mating of an individual is limited to its neighbors. These models, exposing a fine-grain parallelism, are referred to as cellular EAs. In co-operative co-evolutionary GAs, optimization problems are divided into subproblems which are independently solved by different parallel populations evolved in (possibly different) EAs. The parallel components communicate later to build a complete solution from concurrently obtained solutions of the subproblems. An excellent review of parallel models and their advantages and disadvantages has been published recently [63].

Similarly to their sequential counterparts, PEAs have been continuously being employed to a plethora of (single- and multi-objective) optimization problems. These applications include, but are not limited to, various scheduling [42, 67], routing [68], and assignment problems [40, 61], tomography reconstruction [17], classification problems [60], and many others [52, 59, 65]. Recently, we had proposed and then improved the parallel memetic algorithm for solving the VRPTW [912, 4750]

4 Parallel Memetic Algorithm

4.1 Algorithm Outline

In PMA-VRPTW (Algorithm 1), each individual \(p_i\), \(i\in \{1,2,\ldots ,N\}\), corresponds to a solution \(\sigma _i\) with \(K\) routes (\(K(p_i)=K(\sigma _i)=K\), and \(T(p_i)=T(\sigma _i)\))Footnote 1 in a population of size \(N\) (on an island). The initial population is generated using a parallel guided ejection search (P-GES). Its sequential counterpart was proposed in [45], and later improved and parallelized in our recent works [49]. P-GES minimizes \(K\) at first, and then is used to create initial populations (lines 12). They are evolved in PMA-VRPTW to optimize \(T\) (lines 328).

Following the taxonomy discussed in Sect. 3.2, PMA-VRPTW is classified as a homogeneous island-model parallel memetic algorithm, since each island (a parallel process) runs the same MA to minimize \(T\), and the islands communicate to exchange the knowledge acquired up to date.

figure a

4.2 Initial Population

In P-GES (an island-model heuristics), each customer is served within a separate route at first (\(K=M\)). P-GES repeatedly attempts to decrease \(K\) by one at a time. The customers from a random route \(r\) are inserted into the ejection pool (EP) (a set of unserved customers). The attempts of re-inserting them into the solution are undertaken. If there are no feasible insertions for a customer popped from the EP, then the other ones are removed from \(\sigma \) and put into the EP. The \(n\) islands co-operate to exchange their best solutions.

P-GES executes until \(K=\left\lceil D/Q\right\rceil \) (see Sect. 2), or its computation time exceeds the limit \(\tau _K\). Then, it is executed until \(N\) solutions with \(K\) routes are found for each (out of \(n\)) island, or its execution time surpasses \(\tau _N\). Finally, the maximum computation time of minimizing \(K\) and generating the initial population is \(\tau _I=\tau _K+\tau _N\). It is easy to note that PMA-VRPTW is not dependent on P-GES, and it can be thus conveniently replaced by another, perhaps more efficient heuristics (either sequential or parallel) to minimize \(K\).

4.3 Selection

Once the initial population is created, it evolves with time to minimize the total travel distance. First, \(N\) pairs (\(p_a,p_b\)) of individuals from the \(i\)th generation \(G_i\) are selected to create the \((i+1)\)th generation \(G_{(i+1)}\), according to the selection scheme (Algorithm 1, line 7, see also Fig. 2). In this paper, we utilize the AB-selection (AB), which proved to have high exploration capabilities [34, 50]. Here, each individual \(p_{i}\), \(i\in \{1,2, \ldots , N\}\), is selected as \(p_{a}\) at first. Then, the individual \(p_i'\) is chosen as \(p_{b}\), such that \(p_i\ne p_i'\). Each individual can be selected once as \(p_a\), and once as \(p_b\). This selection takes \(O(N)\) time.

Fig. 2
figure 2

Creation of a child in each island in PMA-VRPTW

4.4 Crossover

For each pair \((p_a, p_b)\), \(N_c\) children \(p_c\) are generated using the edge assembly crossover operator (EAX) (Algorithm 1, lines 1015), firstly introduced for the TSP [43], and later adapted to both the CVRP [44], and the VRPTW [46]. It takes \(\mathcal{T}_\mathrm{EAX}(M)=O(M^2)\) time, where \(M\) is a number of customers [8].

The EAX combines two feasible VRPTW solutions \(\sigma _a\) and \(\sigma _b\) (individuals \(p_a\) and \(p_b\)) consisting of \(K\) routes. Let the graphs \(G_a\) and \(G_b\) correspond to \(\sigma _a\) and \(\sigma _b\) (Fig. 3a, b). Then, the EAX operation comprises the following steps:

  1. 1.

    A new set ((\(E_a \cup E_b) \setminus (E_a \cap E_b\))), where \(E_a\) and \(E_b\) are the sets of edges in \(G_a\) and \(G_b\) (Fig. 3a, b), is found and forms the graph \(G_{ab}\) (Fig. 3c).

  2. 2.

    All \(G_{ab}\) edges are divided into AB-cycles (Fig. 3d–f), which consist of the \(G_{ab}\) edges traced alternately—the \(E_a\) edges are traced in the forward direction, whereas the others are traced reversely.

  3. 3.

    The E-set (\(E_S\))—a random AB-cycle (single mode, with the probability \(\mathcal {P}_s=0.5\)), or a combination of AB-cycles sharing at least one node (block mode, \(\mathcal {P}_b=1-\mathcal {P}_s\))—is found. We took the \(2^{nd}\) AB-cycle (Fig. 3d) as \(E_S\).

  4. 4.

    The intermediate solution is constructed from \(p_a\) by removing \(E_a\cap E_S\), and adding \(E_b\cap E_S\). Then, there are \(K\) routes, and possibly some infeasible subroutes, i.e., routes without the depot (Fig. 3g).

  5. 5.

    If the subroutes exist, then a random one to delete is chosen, and its customers are merged with other routes using 2-opt* moves (see Sect. 4.5). This process continues until all subroutes are removed (Fig. 3h).

Fig. 3
figure 3

Illustration of the EAX operator applied to \(p_a\) and \(p_b\): a the graph \(G_a\) corresponding to \(\sigma _a\) (light red), b \(G_b\) corresponding to \(\sigma _b\) (dark blue), c the edges from \(E_a\cap E_b\) are removed from \(E_a\cup E_b\) to form \(G_{ab}\) ((\(E_a\cup E_b)\setminus (E_a\cap E_b)\)), df six AB-cycles, g the intermediate solution (\((E_a\setminus (E_a\cap E_S))\cup (E_b\cap E_S\))), h \(G_c\) corresponding to \(p_c\) (Color figure online)

4.5 Repair, Education, and Mutation

The repair, education, and mutation procedures are the hill-climbing methods based on traditional neighborhoods for VRPTW [35, 46, 56], summarized in Table 1. Here, \(v_{(i-1)}\) denotes the predecessor of \(v_i\), and \(v_{(i+1)}\) is its successor in the route \(r_\alpha \). Similarly, \(v_{(j-1)}\), \(v_j\), and \(v_{(j+1)}\) are defined for \(r_\beta \).

Table 1 Local search moves applied in PMA-VRPTW

Let \(\mathcal {N}(\sigma )\) be the neighborhood of \(\sigma \) obtained by applying the mentioned moves. Since it encompasses a huge number of solutions, we limit its size by considering \(N_{v_i}\) nearest customers for \(v_i\) in each move. Calculating the load change after a move is obvious (\(O(1)\) time), whereas to evaluate the time window penalty we use the approach proposed in [46] (\(O(1)\) for 2-opt*, out-exchange, and out-relocate, \(O(m)\) for other moves, where \(m\) is the route size).

If \(\sigma _c\) is infeasible, then it needs to be repaired (double-lined box in Fig. 2), by performing local search moves to decrease the penalty \(\xi (\sigma )=\beta _1 P_c(\sigma ) + \beta _2 P_{tw}(\sigma )\), where \(P_c(\sigma )\) and \(P_{tw}(\sigma )\) represent the capacity and time window violations, multiplied by some coefficients \(\beta _1\) and \(\beta _2\). We set \(\beta _1=\beta _2=1.0\), and \(N_{v_i}=50\) as suggested in [46]. An infeasible route \(r\) is selected randomly from \(\sigma _c\) at first. Then, the set \(\bigcup _{v \in r} \mathcal {N}(\sigma _c,v)\) of subneighborhoods created for each customer that violates the constraints within \(r\), is determined. Finally, a new solution \(\sigma _c'\), \(\sigma _c'\in \bigcup _{v \in r} \mathcal {N}(\sigma _c,v)\), which minimizes \(\xi (\sigma )\), replaces \(\sigma _c\). This process executes until \(\sigma _c\) is feasible, or there are no further repair moves.

If \(p_c\) is feasible, then it is educated by feasible moves decreasing \(T(p_c)\). If there are no more improvement moves, the process stops. \(p_c\) is then mutated by at most \(I_M\) feasible moves (in this work we set \(I_M=300\)). The best feasible child \(p_c^B\) for this pair of parents is updated if it is necessary (Algorithm 1, line 13). Finally, \(p_c^B\) replaces \(p_a\) in the next generation if \(T(p_c^B)<T(p_a)\) (line 17).

The crossover, repair and education procedures—being the most time-consuming parts of the MA run by each island in PMA-VRPTW—require \(O(M^2)\) time. Performing these operations for \(N\) pairs of parents to generate \(N_c\) children takes \(\mathcal {T}_\mathrm{G}=c_o N M^2\) for a constant \(c_o\). The maximum number of generations is bounded by another constant \(c_g\), \(c_o<c_g\), due to the execution time limit. Hence, the time complexity of the MA run by each island is \(O(M^2)\).

It is worth to mention that we do not incorporate any diversity management procedures into the MA run by each island (e.g., regenerating the population if it encounters the diversity crisis). We avoid introducing new genetic material during the PMA-VRPTW execution in order to better verify how it copes with the local minima faced during the optimization (especially while applying various co-operation schemes discussed in Sect. 4.6).

4.6 Co-operation

The \(n\) islands in PMA-VRPTW co-operate periodically (according to the migration interval \(\delta \), Algorithm 1, line 18) to guide the search towards solutions of a better quality. This process is highlighted by a curly brace in Algorithm 1 (lines 1922). First, the emigrants (i.e., solutions to be sent from the island) are determined (line 19). The emigrants are then transferred to the neighboring island (according to the migration topology) (line 20). This operation may be done either synchronously or asynchronously. In the former case, the sending island waits until the send operation is finished. In this paper, we send the emigrants asynchronously, i.e., the algorithm execution at the sending island proceeds while the communication progresses. Additionally, the selected emigrants are transferred only if they have been updated since the previous co-operation (to avoid unnecessary data transfer).

After acquiring (one or more) immigrants, they are handled by the receiving island according to the policy defined by the co-operation scheme (line 22). They may (i) be appended to the population, (ii) replace some individuals at the receiving island, or (iii) be crossed-over with other individuals.

In our earlier work [47], we investigated different co-operation schemes:

  1. 1.

    Independent runs. Each island runs the MA independently without any co-operation. Finally, solutions are compared and the best one is chosen.

  2. 2.

    Pool. Each island sends to the nominated master island the best \(p^B\) in its population, and it is inserted into the pool. The master determines \(\eta N\), \(0<\eta <1\), best solutions, which replace \(\eta N\) random ones in each island.

  3. 3.

    Pool with EAX (P-EAX). The regular pool is enhanced by the EAX operator employed while handling \(\eta N\) immigrants by each island. The solution \(p_a\)—that is being replaced—is crossed-over with the immigrant \(p^I\), if \(p_a \ne p^I\), to generate \(p^I_c\). If \(p^I_c\) is feasible, then it replaces \(p_a\).

  4. 4.

    Ring. The migration topology constitutes a ring.

  5. 5.

    Randomized EAX (R-EAX). The regular ring is enhanced by the EAX operator. Also, the migration topology (order of islands in the ring) is randomized, and determined by the master before each co-operation.

  6. 6.

    Knowledge synchronization (KS). The nominated master island synchronizes the knowledge acquired by all islands during the search (i.e., distributes the best solutions among the islands).

An initial experimental study performed on 400-customer GH tests clearly indicated that the three last schemes (i.e., Ring, R-EAX, and KS) significantly outperformed the other ones in terms of the convergence time, and balancing exploration and exploitation capabilities of PMA-VRPTW [47]. The corresponding migration topologies of these schemes are given in Fig. 4, along with the other characteristics highlighted in Table 2.

Fig. 4
figure 4

Migration topologies: a Ring, b R-EAX (\(\mathcal {X}\) denotes the EAX operation), c KS. The master process (island) is rendered in light green (Color figure online)

Although the initial experiments gave a rough overview of the co-operation schemes’ performance, we did not consider the impact of the migration interval on the final results. Also, most demanding tests (with 1000 customers) were not investigated. Therefore, in this paper we aim at performing an in-depth analysis of the PMA-VRPTW behavior for each co-operation scheme. Additionally, we define two migration intervals (\(\delta =2\) and \(\delta =20\), given in the number of consecutive generations) to verify their impact on final solutions, and exploration and exploitation capabilities of the parallel MA.

Table 2 Summary of the investigated co-operation schemes

It is easy to see that the time complexity of PMA-VRPTW includes the time complexity of the applied co-operation scheme (see \(\mathcal {T}_\mathrm{C}\) in Table 2), and is given as \(\mathcal {T}_\mathrm{PMA}(n, M)=c_p (c_o N M^2 + c_c \mathcal {T}_\mathrm{C})\), where \(c_p\) and \(c_c\) are some constants. Since we impose the time limit on PMA-VRPTW (\(\tau _P\)), \(c_p\) is bounded by another constant (see the analysis in Sect. 4.5). Thus—in the worst case—the time complexity of PMA-VRPTW is \(O(M^2+nM^2)\) (for a constant population size \(N\)), if the R-EAX co-operation scheme is applied (see Table 2).

5 Experimental Results

5.1 Setup

The PMA was implemented in C++ using the Message Passing Interface (MPI) library. The source code was compiled using Intel 10.1 compiler and MVAPICH1 v0.9.9 MPI library. The performance experiments were conducted on Galera supercomputer (http://task.gda.pl/hpc-en/), whose total theoretical peak performance has been estimated to 50 TFLOPS. It is equipped with 1344 Intel Xeon Quad Core 2.33 GHz processors (5376 cores), each with 12 MB level 3 cache. The nodes were connected by the Mellanox InfiniBand DDR fat-free interconnect (throughput 20 Gbps, delay 5 \(\mu \)s). This supercomputer was executing Linux operating system.

PMA-VRPTW was executed on 96 processorsFootnote 2 (\(n=96\)), and its maximum execution time limits were as follows: \(\tau _K=10\) min., \(\tau _N=60\) min. (thus, \(\tau _I=\tau _K+\tau _N=70\) min.), and \(\tau _P=660\) min. (see Sect. 4 for more details). The population size and the number of children generated for each pair of parents in the MA (run by each island) were not changed (i.e., controlled) during the PMA-VRPTW execution, and were experimentally tuned to the following values: \(N=100\), and \(N_c=20\). For each GH instance, the algorithm was run \(5\) times for each co-operation scheme (and each migration interval). It gives at leastFootnote 3 (\(96\) processors\()\times (660\) min.\()\times (5\) runs\()\times (50\) GH tests\()\times (3\) co-operation schemes\()\times (2\) migration intervals\()= \) 1,584,000 CPU hours of the entire experimental study.

5.2 Dataset

PMA-VRPTW was tested on a classical benchmark set proposed by Gehring and Homberger [24], which reflects various real-life scheduling circumstances. All large-scale tests are split into subclasses, containing customers grouped into clusters (C subclass), dispersed randomly on the map (R subclass), and those with a mix of clustered and randomized customers (RC subclass). There are problems with relatively small vehicle capacities and short time windows (C1, R1, and RC1), and those with larger vehicle capacities and a longer scheduling horizon (C2, R2, and RC2) (see Table 3). These characteristics strongly influence the structure of final solutions, e.g., a larger fleet is necessary to serve customers in the case of small truck capacities and tight time windows.

Table 3 Overview of Gehring and Homberger’s test characteristics

There are problems with various \(M\)’s, \(M\in \{200,400,600,800,1000\}\). Each Gehring and Homberger’s (GH) subclass contains 10 problems (60 instances in total for each \(M\)). Tests are distinguished by their unique names: \(\alpha \_\beta \_\gamma \), where \(\alpha \) denotes the subclass, \(\beta \) relates to \(M\) (2 for 200, 4 for 400, and so forth), and \(\gamma \) is the test identifier (\(\gamma \in \{1,2,\ldots ,10\}\)). In this study, we focus on the most demanding 1000-customer GH tests. The world’s best (currently known) results for GH tests are summarized at the SINTEF website.Footnote 4 Note that the world’s best results is a set of solutions obtained using various algorithms (both sequential and parallel)—see the SINTEF website for details.

5.3 Analysis and Discussion

The first stage of PMA-VRPTW consists in minimizing the number of vehicles (\(K\)). We utilized the parallel guided ejection search (P-GES) to optimize \(K\) at first (within the time \(\tau _K\)), and then to generate an initial population of \(N\) solutions for each island (within \(\tau _N\). For details see Sect. 4). If \(\tau _N\) appears not enough to generate \(N\) solutions for an island, then the remaining solutions are obtained by mutating the already-found individuals (i.e., by applying \(200\) local search moves) (see Sect. 4.5).

It is worth noting, that solutions with lower \(K\) are usually characterized by a larger travel distance \(T\) (if \(K_{\alpha }<K_{\beta }\) then \(T_{\alpha }>T_{\beta }\) in most cases). Since we focus on the travel distance minimization in PMA-VRPTW, we omit \(10\) GH instancesFootnote 5 in the analysis, for which P-GES did not manage to find solutions with \(K=K_\mathrm{WB}\), where \(K_\mathrm{WB}\) is the world’s minimum number of routes, in order to provide a fair comparison. For each of the mentioned tests P-GES ended up with solutions containing \(K=K_\mathrm{WB}+1\) routes (for other tests \(K=K_\mathrm{WB}\)). Despite its significant time complexity [8], P-GES runs very fast in practice—the average times of generating a single individual in a population were (given in seconds): \(\tau _K^A=28.73\) (C1 class), \(\tau _K^A=91.74\) (C2), \(\tau _K^A=6.69\) (R1), \(\tau _K^A=17.53\) (R2), \(\tau _K^A=6.40\) (RC1), and \(\tau _K^A=181.80\) (RC2). These times are neglectable compared to the execution time of PMA-VRPTW. As already mentioned, PMA-VRPTW is independent from P-GES, and it can be easily replaced by a more efficient route minimization algorithm.

The best results (out of \(5\) independent runs) obtained using PMA-VRPTW for each GH instance, along with the best results averaged across the subclasses (for each problem instance) are given in Tables 4 and 5, for \(\delta =2\) and \(\delta =20\), respectively. The best \(T\)’s among co-operation schemes are rendered in boldface. Also, we present the results that are better than the current world’s best ones (they are indicated by the \(\star \) symbol). It is easy to see that the Ring and KS schemes significantly outperformed R-EAX for both migration intervals. This clearly indicates that the additional crossover of solutions (the immigrant and the best individual in an island’s population) does not result in a significant improvement of final solutions. However, for clustered customers with tight time windows (C1 class), the EAX structural changes appeared beneficial and R-EAX slightly outranked the other schemes (Ring was outdone by 0.24 % for \(\delta =2\), and 0.05 % for \(\delta =20\), and KS by 0.13 % for \(\delta =2\), and 0.15 % for \(\delta =20\)). This, in turn, shows that the local changes (resulting from the education and mutation) of best individuals were not able to improve locally-optimal solutions of tests with geographically grouped customers (and that the best solutions are close to the “feasibility border”). In contrary, both Ring and KS gave much better results than R-EAX for other GH subclasses. It is worth mentioning that PMA-VRPTW converged to the solutions better than already published for the entire R2 subclass, and for the majority of RC2 tests (see Ring and KS in Tables 4 and 5).

Table 4 The best results (out of 5 runs) obtained using PMA-VRPTW with various co-operation schemes (\(\delta =2\))
Table 5 The best results (out of 5 runs) obtained using PMA-VRPTW with various co-operation schemes (\(\delta =20\))

The average results (out of \(5\) runs for each configuration) are given in Tables 6 and 7 (similarly, the best resultsFootnote 6 are rendered in boldface, and those better than the world’s best ones are annotated with \(\star \)). The results show that Ring and KS provide the most stable results, and are able to guide PMA-VRPTW to asymptotically similar solutions (R-EAX reached very high-quality C1 solutions only). For frequent co-operation (\(\delta =2\)), KS turned out to be the best scheme on average, and managed to compensate the co-operation overhead by synchronizing the already-gained knowledge across the islands (this scheme is very exploitative). On the other hand, Ring is shown to be an explorative scheme, which is able to converge to the best results with relatively large migration interval (i.e., rare co-operation) by exploring a large part of the search space (\(\delta =20\)). A noteworthy observation is that PMA-VRPTW is able to improve the world’s best results of wide-time-window tests (R2 and RC2) with a very large probability (70 % of R2, and 43 % of RC2 average results have \(T<T_\mathrm{WB}\), in the case of both Ring and KS).

Table 6 The average results (out of 5 runs) obtained using PMA-VRPTW with various co-operation schemes (\(\delta =2\))
Table 7 The average results (out of 5 runs) obtained using PMA-VRPTW with various co-operation schemes (\(\delta =20\))

The results—both average and best—obtained using various co-operation schemes are summarized in Table 8 (we consider only tests for which \(K=K_\mathrm{WB}\)). In boldface are indicated the best results across co-operation schemes (separately for \(\delta =2\) and \(\delta =20\)). The results confirm that KS is the most appropriate scheme for frequent co-operation, whereas Ring for a rarer one. KS is very stable and converged to high-quality results (see #\(T^{A\star }\)). In Table 9, we present the best asymptotic results for selected GH instances (i.e., those for which PMA-VRPTW obtained results with \(T<T_\mathrm{WB}\))—the new world’s best results are shown in boldface.Footnote 7 Although Ring appeared to be very competitive (see #\(T_B\), %\(T_B\), and #\(T^{\star }\) in Table 8), it is KS (with \(\delta =20\)) which gave the asymptotically best solutions (Table 9). Thus, the exploration of the search space is well-balanced with exchanging the best solutions and their further exploitation. Too frequent migration (\(\delta =2\)) results in saturating islands with similar solutions which are not further improved during the search.

Table 8 Summary of the results obtained using PMA-VRPTW with various co-operation schemes: #\(T_B\)—number of the best \(T\)’s obtained with the corresponding co-operation scheme, %\(T_B\)—percentage of the best \(T\)’s, #\(T^{\star }\)—number of \(T\)’s such that \(T<T_\mathrm{WB}\) (the meaning of the results with \(A\) superscript—averaged \(T\)’s, see Tables 6 and 7—are analogous)
Table 9 The best results (out of 5 runs) obtained using PMA-VRPTW with various co-operation schemes for selected GH benchmark tests (for which PMA-VRPTW ended up with a solution such that \(T<T_\mathrm{WB}\))

In Fig. 5, we present the average convergence time (i.e., after which the best individual among all islands has not been improved) of PMA-VRPTW. It is easy to see that C1 and C2 tests can be solved much faster compared with the other GH instances. Here, applying R-EAX scheme is beneficial in terms of the solutions quality (it gave best asymptotic results), but it also increases the convergence time of PMA-VRPTW. Since R-EAX requires crossing-over individuals (which may result in infeasible children), and restoring their feasibility if necessary (in a time-consuming repair process), this scheme significantly affects the algorithm convergence capabilities. In contrary, KS scheme distributes the best individuals between islands relatively fast, and guides the search efficiently towards the most-promising parts of the search space.

Fig. 5
figure 5

The convergence times (in seconds) of PMA-VRPTW with various co-operation schemes applied, averaged for each GH subclass: a \(\delta =2\), b \(\delta =20\) (Color figure online)

Since the execution time of the algorithm (and more importantly—time of converging to acceptable solutions) is an important criterion in case of real-life applications, determining a proper co-operation scheme becomes a critical issue for emerging PEAs. In Fig. 6, we visualize a trade-off between minimizing the convergence time of PMA-VRPTW, and obtaining solutions with the highest possible quality. Clearly, R-EAX is the most time-consuming scheme (and introduces the largest co-operation overhead) among the considered ones, and choosing this scheme is advantageous only for C1 tests. Is is easy to see that applying KS (for both \(\delta =2\) and \(\delta =20\)) results in the fastest convergence of PMA-VRPTW in most GH subclasses. There are however subclasses for which Ring (with \(\delta =20\)) gave higher-quality asymptotic solutions on average (R1, RC1, RC2), but the convergence time is significantly larger here compared with the KS scheme (see Fig. 6c–f).

Fig. 6
figure 6

The front (the average \(T\) vs. the average PMA-VRPTW convergence time in seconds) for each GH subclass (the closer to the point \((0, 0)\), the better): a C1, b C2, c R1, d R2, e RC1, and f RC2 (\(\delta \) is given in brackets)

In Fig. 7, we present the average travel distance (for best co-operation schemes) for each subclass during the initial \(180\) min. of the PMA-VRPTW execution. This complements the results shown in Fig. 6, and gives a better insight into the PMA-VRPTW convergence capabilities. The small migration intervals are very beneficial at the beginning of the optimization process (see Fig. 7). If islands co-operate frequently, then the gained knowledge is distributed very fast. It can be later utilized to exploit known (already optimized) solutions. A noteworthy feature of the co-operation schemes with a larger migration interval applied (KS and Ring with \(\delta =20\)) is their ability to balance exploration and exploitation of the search space. Thus, they are able to outperform the other schemes asymptotically, and to give the best solutions in a longer optimization horizon.

Fig. 7
figure 7

The travel distance averaged for each GH subclass obtained using PMA-VRPTW (with the best co-operation schemes applied) within the initial 180 min. of execution: a C1, b C2, c R1, d R2, e RC1, and f RC2 (\(\delta \) is given in brackets)

6 Conclusions and Future Work

In this paper, we investigated the impact of the applied co-operation scheme (migration topology, interval, handling of immigrants, and emigrants) on the search capabilities of our parallel MA to solve the VRPTW. The extensive experimental study which comprised more than 1,584,000 CPU hours on an SMP cluster gave a detailed insight into the schemes’ characteristics and capabilities. Since real-life logistic and distribution problems are diverse at their heart (they differ by the customer locations, characteristics of time windows, vehicle capacities, and many more), we solved and analyzed the most-demanding 1000-customer benchmark tests reflecting various scheduling circumstances.

Concluding from our experiments, we provide the guidelines for choosing a proper migration topology and interval (see Table 10). If the execution (convergence) time of PMA-VRPTW is to be as minimum as possible, then the frequent co-operation should be favored to guide the search efficiently (KS). If there is no strict upper bound for the running time of PMA-VRPTW, then applying the rarer co-operation intervals benefits from a broader exploration of the search space (Ring), and gives the best asymptotic results.

Table 10 The guidelines for choosing the co-operation scheme in PMA-VRPTW

Our ongoing works are focused on combining two most successful co-operation schemes—KS and Ring—into an adaptive scheme. This scheme will dynamically control itself (its migration topology and interval) according to the current state of exploring the search space. Also, we work on the adaptive MA, in which the parameters (including the population size, selection scheme, and more) are controlled (and appropriately changed) on the fly. This approach will mitigate the necessity of performing a time-consuming tuning process of PMA-VRPTW parameters. Finally, we plan to apply PMA-VRPTW for other VRPs—especially the pickup and delivery problem with time windows.