Standardized validation of vehicle routing algorithms

Jastrzab, Tomasz; Myller, Michal; Tulczyjew, Lukasz; Blocho, Miroslaw; Kawulok, Michal; Czornik, Adam; Nalepa, Jakub

doi:10.1007/s10489-023-05212-0

Standardized validation of vehicle routing algorithms

Open access
Published: 05 January 2024

Volume 54, pages 1335–1364, (2024)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

Standardized validation of vehicle routing algorithms

Download PDF

Tomasz Jastrzab¹,
Michal Myller¹,
Lukasz Tulczyjew¹,
Miroslaw Blocho²,
Michal Kawulok¹,
Adam Czornik¹ &
…
Jakub Nalepa¹

1150 Accesses
1 Altmetric
Explore all metrics

Abstract

Designing routing schedules is a pivotal aspect of smart delivery systems. Therefore, the field has been blooming for decades, and numerous algorithms for this task have been proposed for various formulations of rich vehicle routing problems. There is, however, an important gap in the state of the art that concerns the lack of an established and widely-adopted approach toward thorough verification and validation of such algorithms in practical scenarios. We tackle this issue and propose a comprehensive validation approach that can shed more light on functional and non-functional abilities of the solvers. Additionally, we propose novel similarity metrics to measure the distance between the routing schedules that can be used in verifying the convergence abilities of randomized techniques. To reflect practical aspects of intelligent transportation systems, we introduce an algorithm for elaborating solvable benchmark instances for any vehicle routing formulation, alongside the set of quality metrics that help quantify the real-life characteristics of the delivery systems, such as their profitability. The experiments prove the flexibility of our approach through utilizing it to the NP-hard pickup and delivery problem with time windows, and present the qualitative, quantitative, and statistical analysis scenarios which help understand the capabilities of the investigated techniques. We believe that our efforts will be a step toward the more critical and consistent evaluation of emerging vehicle routing (and other) solvers, and will allow the community to easier confront them, thus ultimately focus on the most promising research avenues that are determined in the quantifiable and traceable manner.

How to Generate Benchmarks for Rich Routing Problems?

A hybrid heuristic for a broad class of vehicle routing problems with heterogeneous fleet

Article 09 November 2017

A New Rich Vehicle Routing Problem Model and Benchmark Resource

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Solving rich vehicle routing problems (VRPs) is a vital research area due to their numerous real-life applications, including the delivery of cash to banks, industrial waste collection, school bus routing, first- and last-mile logistics, escape route planning, and many more [28, 40, 96]. In modern intelligent transportation systems (ITSes) [82], we aim at accommodating an increasing number of transportation requests while exploiting environmentally friendly approaches to reduce air pollution and road congestion [25, 47], especially in the city area. Thus, optimizing the routing schedules is an integral part of ITSes that are becoming essential in the smart city environments [32, 37].

The feasibility of routing plans is commonly affected by practical constraints, including the maximal vehicle capacities, time windows in which the transportation requests should be handled, various precedence and inter-request dependencies, and other constraints that are reflected in numerous VRP formulations [73, 77]. They encompass, among others, capacitated VRP (CVRP) [33], weighted VRP (WVRP), VRP with time windows (VRPTW), cumulated VRP with time windows (CumVRP-TW) [17], pickup and delivery with time windows (PDPTW) [44], multiple-depots VRPs [61], and VRPs with heterogeneous fleets [31]. A recently trending variant is the green VRP focusing on environmental issues [48, 62]. One of the aspects of this VRP is a fleet of electric vehicles [13] that introduces an additional constraint related to the vehicle’s recharging (either partial or full) along the route (also, there are combined variants, e.g., electric VRPTW [EVRPTW]) [21, 41]. There are formulations with occasional drivers that can complement the fleet of vehicles as well [15, 49].

Such rich VRPs are often considered to be the single- or multi-objective optimization problems [84, 93], with minimizing the number of vehicles feasibly serving the requests being the primary objective, and optimizing the total travel distance being the secondary objective. These objectives, however, do not reflect real-life use cases, and are often not enough to build the full picture on the abilities of the optimization algorithms. These metrics do not quantify various practical aspects of ITSes that are pivotal in industrial applications (including those related to the user perspective)—such aspects include, among others, the passengers’ satisfaction, fleet utilization, optimization time or convergence capabilities of the solvers. Note that if a solver converges to structurally-similar schedules, then it might be of an important practical use, especially in the scenarios in which external factors may trigger the updates of the routing schedule. Here, rebuilding the solution would be very costly, because it could incur the necessity of redirecting vehicles that are already in operation. Thus, utilizing classic quality measures indeed allows for comparing the emerging techniques, but it poses an important challenge once a practical transportation problem is to be tackled, as exploiting those metrics may lead to suboptimal or impractical choices^{Footnote 1}.

Although there is a battery of VRP solvers that can be utilized in a range of real-life formulations (we discuss such algorithms in Section 1.1), the emerging—deterministic, stochastic, and dynamic (e.g., being updated online)—VRP variants bloom, as they better reflect modern ITSes [50]. It triggers research in the area, and the new VRP solvers are being actively developed which, in turn, makes their thorough verification and validation a critical yet challenging task. We address this issue in the work reported here through proposing a comprehensive validation methodology that can shed more light on the functional and non-functional abilities of the optimization techniques for any rich VRP. Our contribution is discussed in detail in Section 1.2.

1.1 Related work

The algorithms for solving rich VRPs are split into approximate and exact methods. The latter techniques are commonly based on the column generation algorithms, and the branch-and-{bound, cut, price} approaches [3, 67]. Although they ensure obtaining optimal routing plans, such algorithms are only applicable to small-sized instances due to the NP-hardness of VRPs. On the other hand, approximate methods allow to elaborate high-quality, but not necessarily optimal solutions and are of high practical importance, as they can be conveniently applied to massively large VRPs. Additionally, such approaches are being actively developed in other applications areas [36], as they can lead to obtaining very high-quality solutions to challenging discrete optimization problems in an affordable time. These areas include, but are not limited to resource provisioning and task scheduling for heterogeneous applications in distributed green clouds [97], scheduling of crude oil operations in refineries [26], disassembly planning [16], solving industrial group scheduling [103] and job shop scheduling [8] problems, building commercial recommendation systems [91], bike repositioning in bike-sharing systems [30], energy-optimized partial computation offloading in mobile-edge computing [2], facial image manipulation [95], or allocating virtual machines in data centers [102].

Approximate techniques encompass the route and distance minimization approaches (RM and DM, respectively) which tackle two most important objectives of rich VRPs. The RM heuristics include the construction and improvement ones [38]. Construction algorithms successively build the solution from scratch. Typically, when none of the existing vehicles can handle a request without violating the constraints, a new one is attached to the operating fleet [43]. The improvement algorithms start with a low-quality solution and iteratively improve it through relocations or exchanges of route fragments [85]. The exchange heuristics involve intra-route and inter-route exchanges, which modify the connections within a single route or between different routes [41]. Additionally, there are hybrid solvers that couple different approximate algorithms [27], or exploit exact techniques to intensify the search within the most promising parts of the solution space [12, 23]. Finally, two-phase algorithms follow two general schemes: cluster-first route-second and route-first cluster-second. The main idea is to reduce the search space’s size by combining multiple transportation requests into clusters—this way, the problem becomes easier to solve [105]. Combining vehicles from different clusters is often not allowed, but adding non-clustered customers appearing along the way to a particular cluster is possible. Moreover, such algorithms enable us to seamlessly hire additional vehicles (drivers), perhaps at a higher cost, if there is no solution after the clustering phase. Hence, they can be effectively applied in ITSes in which transportation requests may appear outside the main traffic network.

The DM approaches are often built upon various (meta) heuristic techniques involving both population- and local search-based algorithms [4], with the latter operating on a single solution. The bio-inspired techniques encompass, among others, ant [66] or bee colony optimizations [22], particle swarm optimization [100], firefly [1, 65], single- and multi-objective evolutionary algorithms [81, 87], and bat techniques [60]. There exist approaches that hybridize local (intense) optimization with the global exploration of the solution space—memetic algorithms (also referred to as hybrid genetic algorithms) utilize local-search operators to effectively balance the search exploration and exploitation, and to ultimately converge to high-quality solutions faster [7, 83]. The algorithms that optimize a single solution (which is in contrast to the population-based techniques) involve a wide range of tabu searches [20, 39], as well as a variety of neighborhood search methods [10, 74, 92]. Finally, parallel techniques play an important role in solving different VRPs, as they can not only accelerate the computations [15, 54], but also allow to elaborate higher-quality routing schedules, e.g., through efficient cooperation of parallel solvers [6, 24, 51, 53, 59].

The heuristic algorithms for VRPs are of high practical importance, but they are often heavily parameterized. Their offline parameter tuning is a cumbersome and time-consuming task, as it commonly involves a trial-and-error procedure which requires executing a considered algorithm multiple times with different parameterizations. On top of that, incorrectly selected hyperparameters can easily deteriorate the performance of these methods. Additionally, a single parameter is tuned at a time in the majority of such approaches—it can result in suboptimal choices since the parameters are not independent. Although there are well-established tools allowing to automatically configure algorithms, such as irace [46], developing the approaches which employ run-time (self)adaptation of hyperparameter values is a blooming research area, as this approach may help better respond to the optimization status [71]. There are very successful algorithms that benefit from such a technique in other fields, and range from the approaches for developing the dendritic neural models for classification, approximation and prediction that utilize the Taguchi’s experimental design method for elaborating the desired parameterizations [18], online fault detection models and strategies [101], designing deep neural networks with the use of particle swarm optimization [45], improving the abilities of swarm techniques while enhancing their exploration competences through adaptively selecting the most important parameters [11], and many more [88]. A preprocessing-based adaptation was proposed by [57, 58]. Here, the idea was to analyze the characteristics of the problem instance and select the best algorithm parameters for a given test variant. In a similar vein, [9] argued that an appropriate initialization is pivotal for obtaining higher-quality VRP solutions and applied an array of machine learning techniques to learn how to use distinct features from four commonly used construction heuristics for VRPs. An interesting research pathway includes the adaptations that benefit from the historical data collected through GPS devices [104]. Finally, dynamic evolutionary techniques control the hyperparameter values during the execution using three common approaches: in a (i) deterministic manner without any feedback from the running evolutionary solver, (ii) through the adaptive techniques, where there is a feedback loop that indeed affects the hyperparameter updates, and in a (iii) self-adaptive way, where the parameters are encoded into individuals and evolve during the optimization [55]. Although dynamic approaches can respond to the search progress, such adaptations are often dependent on specific instance characteristics, hence they require a thorough validation process that could prove their generalization abilities over different VRPs.

We have been witnessing an unprecedented tsunami of “novel” heuristics for not only VRPs, but for tackling combinatorial problems in general. In an excellent paper, [80] proved that virtually all such methods are based upon a metaphor of some natural or man-made process, and proposing new techniques in this line of research can be threatening to the field due to the lack of appropriate scientific rigor [80]. Additionally, [80] pointed out that “comparing different metaheuristic algorithms has so far been a largely unstructured affair, with testing procedures being determined on the fly and sometimes with a specific outcome in mind”. There have been attempts toward designing new benchmarks for specific VRPs [86, 99]. Uchoa et al. [86] introduced a set of 100 CVRP instances including transportation requests of different spatial distributions, and showed that they can be solved using state-of-the-art exact and heuristic techniques. On the other hand, a real-world mail delivery case of the city of Artur Nogueira was investigated by [99]. In neither of these works, the authors introduced the validation methodology that could be used to investigate the emerging techniques ([86] analyzed, however, the impact of the instance characteristics on the performance of specific techniques), and focused on proposing the VRP test instances (the way such instances are used in future works is not standardized). This is a serious flaw, as we know that our understanding of the future algorithm’s performance strongly depends on the way we validate it, and on how carefully we select test instances so that the generalization of algorithm performance on future instances can be inferred [78].

In Table 1, we present the recent algorithms for tackling various rich VRPs, alongside their experimental settings. Additionally, it is common that the authors compare the new algorithms with a selected one (or the best known routing schedules) through reporting the “gap” between the obtained and previously known solutions, calculated according to a specific metric, sometimes designed on the fly.

Table 1 The algorithms for solving VRPs are commonly verified over classic quality metrics, such as the number of routes ($K$) and the total travel distance ($T$)

Full size table

Table 2 Symbols and notations used in this paper

Full size table

1.2 Contribution

We can note that the validation procedure followed in the research papers varies across different studies, and there is no widespread acceptance in this regard in the field. Additionally, the practical aspects of the algorithms and resulting routing schedules are commonly neglected while validating new techniques in an unstructured way. We tackle these important issues and introduce a comprehensive methodology to verify and validate the existing and emerging approaches for solving rich VRPs in a multi-faceted way. This methodology can be considered “comprehensive”, as it clearly defines the most important aspects of such validation procedures, being (i) the approach for generating benchmark test instances of varying difficulty, (ii) a set of metrics (which reflect practical aspects of routing schedules) for quantifying the quality of obtained VRP solutions, and (iii) the procedure that enables us to evaluate techniques for solving VRPs in a reproducible and hands-free manner. Thus, we believe that our efforts will become an important step toward combating the reproducibility crisis in the artificial intelligence research [34] through standardizing the way the community validates the emerging algorithms for tackling rich VRPs. Overall, our contribution centers around the following points:

We propose an end-to-end procedure toward evaluating techniques for solving VRPs in a quantifiable, reproducible, and hands-free manner, with an intention of building a standardized methodology of confronting the VRP solvers (Section 2). It enables us to (i) perform quantifiable, reproducible, and traceable research, (ii) automatically generate VRP benchmarks that are solvable and characterized by varying difficulty, (iii) comprehensively investigate single- and multi-step solvers, and thoroughly validate each component of such algorithms in the quantitative, qualitative, and statistical manner thanks to its modular architecture.
We propose a benchmark generation procedure that allows us to elaborate VRP test instances of varying difficulty (that may be conveniently controlled through the generator’s parameterization), with a baseline solution proving their solvability (Section 2.2). Such instances may be located in a real-life coordinate system, and can adhere to any set of transportation constraints.
We propose a set of generic metrics that allow us to quantify the quality of obtained VRP solutions in terms of their real-life practical characteristics, such as profitability, customer satisfaction, and many more (Section 2.3.1). These metrics are useful in building an understanding of practical aspects of the VRP solvers that can ultimately influence the process of selecting a specific optimization algorithm for deploying it in the target ITS. It is an inevitable step toward designing efficient and user-centered ITSes that meet the user needs.
We propose a novel way of quantifying the distance between the VRP solutions via using new graph similarity metrics (Section 2.3.2). This quantification may be pivotal in understanding the convergence abilities of randomized solvers, and in confronting different optimizers. Note that minimizing the structural distance between solutions obtained using an investigated algorithm may be pivotal in dynamic VRPs, in which the external factors can affect the current routing schedule. In such scenarios, practitioners would likely select the solvers that lead to structurally-similar routing schedules, as the local changes are always easier to implement within the operating fleet of vehicles (if rerunning a solver over a slightly modified transportation instance led us to obtaining a significantly different set of routes, then it would be extremely costly to reschedule the vehicles that were already in operation).
We present the abilities of our framework in practice through investigating various state-of-the-art heuristics for solving PDPTW, being a challenging yet representative NP-hard variant of a rich VRP, capturing different optimization constraints that are common in other VRPs. We focus on both benchmark and real-life tests that were elaborated using our generation procedure (Section 3). Our experiments are backed up with a battery of visualizations that show that the design of the proposed framework makes exporting various artifacts straightforward, and they can be easily integrated with widely adopted tools. Additionally, we equip our paper with a video showing our analysis tools in a step-by-step manner (the video is available at https://gitlab.com/jnalepa/standardized_vrp) .

We hope that our contributions will be an important step toward building an established and standardized framework for thorough and fair validation of new VRP solvers, and thus allow the researchers escape the trap of testing their algorithms in an ad-hoc, unstructured and biased way. To avoid any misunderstanding, it is worth mentioning that in this manuscript we do not propose a new algorithm for solving a specific rich VRP—we contribute to the body of knowledge which focuses on verification and validation of such techniques in scientific and practical settings.

1.3 Paper structure

The remainder of this paper is structured as follows. Section 2 presents our framework for the automatic validation of the VRP techniques, together with our approaches toward the quantitative, qualitative, and statistical analysis of the investigated techniques. In Section 3, the results of our experimental study are presented and discussed in detail. The study is split into three experiments, the first investigating the functional capabilities of the entire validation pipeline through generating real-life PDPTW test instances in the Gliwice area, Poland, and ultimately solving them using selected algorithms from the literature. In the second experiment, we show how to objectively compare selected algorithms over the benchmark test instances, whereas the third experiment further proves the practical utility of our system through deploying it over the sample data collected from the existing Demand Responsive Transport system. Section 4 concludes the paper and highlights the most exciting future research directions which may be followed based on the results and ideas presented in this paper. Finally, Table 2 gathers the symbols and notations used in this paper.

2 Method

In Section 2.1, we present a high-level overview of our framework for the automatic validation of the VRP algorithms. Its pivotal components are discussed in detail in Section 2.2 (Benchmark Generator), and in Section 2.3, which highlights our approaches toward the quantitative, qualitative, and statistical analysis of the investigated techniques. Although the introduced pipeline is independent of the underlying VRP variant, we focus on PDPTW to provide easy-to-follow concrete examples of its real-life implementation.

2.1 Overview of the validation approach

The proposed validation framework consists of the benchmark provider, solver (being an algorithm that is undergoing the analysis), and the post-processing module. The first module, referred to as the Benchmark Generator (see Fig. 1 and Section 2.2), supplies the solver with already defined benchmarks or newly generated datasets, and combines the data from both sources. The solver includes three major components, marked in blue in Fig. 1: the Initial Solution Generator (ISG), and the RM and DM components. Each solver building block operates on the same set of benchmarks (passed through the in1 entry point), and produces a number of solutions (through the out exit point). Additionally, RM and DM start off either with a set of predefined solutions provided externally (e.g., from earlier solver runs), or with those that are supplied directly from one of the previous components (through in2). The post-processing module involves several elements (discussed in Section 2.3) run independently for each solver’s component, and finally for the solver as a whole.

There are three important advantages of the framework. Firstly, it considers the solver components as black boxes, allowing to automatically use and validate a wide range of algorithms (see Section 3). Secondly, the solver allows for a great dose of parallelism, both at the data and instruction level. Data-level parallelism stems from the fact that all the benchmarks can be solved independently, whereas the instruction-level parallelism results from the potential independence of the RM and DM modules executed on a set of VRP tests. Finally, the artifacts elaborated by the post-processing module can be easily integrated with existing map engines. For simplifying the discussion—although our framework is generic and can be coupled with any engine—we assume that Google Earth and the corresponding Keyhole Markup Language (KML) files are the default choices that help visualize the obtained solutions.

2.2 Benchmark generator

The Benchmark Generator (BG) operates on two (not necessarily disjoint) sets of vertices $\mathcal {V}_1$ and $\mathcal {V}_2$ representing the possible pickup and delivery locations, respectively. The generation process is controlled by five parameters: $\mathcal {N}_r$, being the number of requests to generate (hence, we have $\mathcal {N}_r/ 2$ pickup-delivery pairs), $\mathcal {N}_{vr}$—the suggested^{Footnote 2} number of requests per vehicle, $\mathcal {C}$—the vehicle’s capacity, $\mathcal {W}$—the time window (TW) span taken with respect to the distance between successive locations, and $\mathcal {T}$—the service time.

The benchmark instances are created using the algorithm shown in Fig. 2. Note that its input parameters are sufficient to generate a wide variety of tests having, e.g., clustered or randomly distributed locations, narrow or wide TWs, and so forth. Moreover, while generating a test $\mathcal {B}$, we simultaneously obtain a valid (but not necessarily optimal) solution $\mathcal {S}$, thus we ensure that the instance is solvable. First, the depot location is randomly selected (line 3), and—until the desired number of requests is reached—we repeat the following steps:

1.
labelgeneratorspsstep:2 Randomly select the actual number of requests per vehicle from the range $\left[ \mathcal {N}_{vr}- s\%, \mathcal {N}_{vr}+ s\%\right] $ (line 5).
2.
Randomly select the locations $l_j$ and $l_k$ from $\mathcal {V}_1$ and $\mathcal {V}_2$, and generate some non-zero demand not exceeding $r\%$ of $\mathcal {C}$, where $r$ is a hyperparameter of BG (lines 7–8).
3.
Calculate the pickup and delivery TWs (lines 9 and 12; $t_i, t_j$ are the current times, and $d_{ij}, d_{jk}$ denote the travel times between respective locations), update the current time (line 14), and add requests to the benchmark test and to the corresponding solution (lines 10, 13, and 15).

The validity of the solution $\mathcal {S}$ is ensured as follows:

1.
The precedence constraint is satisfied since within each route the pickup is always put before the delivery (see the TW definitions in lines 9 and 12, Fig. 2).
2.
The capacity constraint is satisfied because the demand never exceeds $\mathcal {C}$, and due to consecutive pickups and deliveries, the remaining capacity can never be negative, and the “amounts” of the corresponding pickups and deliveries are equal.
3.
The TW constraint is satisfied since the TWs are built around the actual time-to-reach of a given location. The latest time-to-return to the depot is calculated after all the routes are established, based on the final deliveries.

Overall, BG generates tests of various characteristics, alongside the corresponding feasible solutions in $O(\mathcal {N}_r)$ time.

2.3 Quantitative, qualitative, and statistical analysis

To perform the quantitative, qualitative, and statistical analysis, we exploit the post-processing module which elaborates a number of various artifacts (Fig. 1). In the following points, we discuss the process of assessing the final VRP solutions and of assessing the capabilities of different algorithms.

2.3.1 Assessment of solutions: feasibility and quality

Before assessing the solution (and the algorithm used to produce it), we first need to ensure that the solution is feasible, hence it adheres to the assumed optimization constraints. The assessment of the solution’s feasibility involves the following steps (for the considered PDPTW case):

1.
Preliminary validation—each passenger^{Footnote 3} has to be picked up from and delivered to the correct location; the same passenger has to appear exactly twice within the solution (once for the pickup and once for the delivery operations).
2.
Precedence validation—for each passenger, the pickup has to occur before the corresponding delivery.
3.
Capacity validation—the capacity must never be excee-ded for any vehicle.
4.
Time window validation—for each request (either pickup or delivery), the actual arrival time at the given location must not fall after the TW closing time. It is, however, allowed to arrive before opening the TW (the vehicle waits at this location in this situation).
5.
Depot validation—each route has to start and finish at the depot; the time of arrival at the depot cannot exceed the time-to-return value, and no route can start before opening the depot’s TW.

Note that the feasibility assessment is generic enough to cover not only PDPTW but also other types of routing problems, such as CVRP (capacity validation), VRPTW (time windows validation), and so forth.

Given that a routing schedule is feasible, we assess its quality primarily based on the number of vehicles used in this solution ( $K$) and the total distance traveled by the vehicles ($T$), as these are two common primary and secondary optimization objectives. Additionally, we capture the convergence time $\tau $, i.e., the time required to produce the final solution. Apart from such classic (primary) quality metrics, we introduce a set of secondary metrics. These metrics carry a lot of practical significance and can be applied to assess the quality of scientific and industry-oriented VRP solvers. Our validation methodology utilizes the following metrics:

1.
Total vehicles’ round-trip time ($\alpha $)—in real-life routing problems (i.e., located within the geographical coordinate system constrained by the features of a specific traffic network), the time required to travel a certain distance is not equal to the distance itself. Measuring the travel time is important from the point of view of obeying the regulations governing the drivers’ work times. This metric should be minimized.
2.
Average vehicles’ round-trip time ($\beta $), calculated as $\alpha / K$, estimates the average duration of a round-trip. Knowing this value and the actual duration of the respective routes, we can observe the variability of the routing schedule. Consequently, we can also draw conclusions about the distribution of workload among drivers assigned to particular routes. This metric should be minimized through the reduction of $\alpha $, without increasing the total number of routes ($K$).
3.
Average number of vehicle round-trips per hour ($\gamma $), calculated as $3600 / \beta $. It indicates the frequency of vehicles’ visits in the given area. It can be also computed for individual routes, to see whether certain locations are visited more frequently than the others. This metric should be maximized since more frequent visits in the given area enable the passengers to choose the most appropriate time to travel, thus make them more satisfied with the transportation system/provider.
4.
Total passengers’ travel distance ($\delta $), calculated as a sum of distances traveled by passengers inside a vehicle (distance registration starts at the pickup location and finishes at the delivery location). Note that this value is always smaller than $T$, since it does not take into account the distances traveled by an empty vehicle between the depot and the first pickup (last delivery) location in any route. In some cases (e.g., dial-a-ride scenario) this metric can be an indicator of passengers’ satisfaction. Ideally, the passengers should be delivered to their destination immediately after pickup, without the need to travel excessive distance to other locations, as this may incur costs. This metric should be minimized.
5.
Total passengers’ travel time ($\varepsilon $)—combined with $\delta $, this metric informs how much time the passengers spend in transit (including the service times at the locations visited after the pickup and before the delivery). Note that for real-life problems, close proximity of certain stops does not mean that the travel times are also short (e.g., the route may include segments of high traffic frequency, increasing the travel times significantly, without affecting the distance traveled). Comparing two solutions, an increase in this metric indicates a change in the route schedule resulting in delayed delivery of one or more passengers. This metric should be minimized.
6.
Average pickup-delivery pair’s travel time ($\zeta $), computed as $\varepsilon / (\mathcal {N}_r/ 2)$. It shows the average time spent by a group of passengers (resulting from a pickup-delivery pair) in a vehicle. This metric should be minimized.
7.
Total cost ($\eta $), calculated as $c_{K}\cdot K+ c_{T}\cdot T/ 1000$, where $c_{K}$ is the cost of a vehicle and $c_{T}$ is the cost per each traveled kilometer (or other unit). In practice, $c_{K}$ can correspond to the salary of a driver, the insurance costs, and so forth, while $c_{T}$ can be the fuel cost. Both $c_{K}$ and $c_{T}$ can be customized to reflect, e.g., different vehicle types (electric, fuel, gas, etc.). This metric should be minimized.
8.
Average cost per route ($\theta $), calculated as $\eta / K$. It helps observe the proportion of cost incurred by the vehicle itself and the distance it travels in a route. Moreover, it indicates the increase in traveled distance needed to balance the cost of the vehicle. This metric should be minimized. The $\theta $ reduction should result from the reduction of $\eta $, and not from the increase in $K$.
9.
Average number of pickup-delivery pairs per vehicle ($\kappa $), calculated as $(\mathcal {N}_r/ 2) / K$ estimates the average route length given in terms of the pickup-delivery pairs. This metric should be maximized, since an increase in $\kappa $ means that the value of $K$ decreases, which is desired, as fewer vehicles incur smaller costs.
10.
Average number of passengers per vehicle ($\lambda $), calculated as $\mathcal {D}/ K$, where $\mathcal {D}$ denotes the total demand (number of passengers) across all pickups. Comparing the value with the vehicle’s capacity $\mathcal {C}$, it can be observed how well the available space in the vehicle is utilized. A low value indicates that smaller (and possibly cheaper) vehicles could be used to handle the routing schedule. This metric should be maximized to better utilize the vehicles’ capacity.
11.
Average arrival waiting time ($\xi $), calculated as $t_{a}/$ $ (\mathcal {N}_r/ 2)$, where $t_{a}$ is the total arrival waiting time. The arrival waiting time pertains to the deliveries, and is determined as the positive difference between the actual arrival time at the delivery location and the left time window margin of this location. In practice, it indicates how long the passenger needs to wait past the optimal arrival time (left TW margin) to be dropped off. To better contextualize it within a practical transportation scenario, let us note that—in the case of a communication hub with very high traffic load—a bus may not be allowed to enter the bus stop prior to its scheduled time to avoid interference with other buses sharing the same stop. Consequently, being at the destination (in terms of geographical coordinates), the passengers may not be allowed to get off the bus earlier due to safety reasons. This metric should be minimized.
12.
Average departure waiting time ($\pi $), calculated as $t_{d}/ (\mathcal {N}_r/ 2)$, where $t_{d}$ is the total departure waiting time. The departure waiting time pertains to the pickups, and is determined as the positive difference between the actual arrival time at the pickup location and the left time window margin of this location. In practice, it indicates how long the passenger needs to wait past the optimal departure time (left TW margin) to be picked up. This metric should be minimized.
13.
Average waiting time ($\rho $)—this metric combines the arrival and departure waiting times discussed before and is given by $(t_{a}+ t_{d}) / (\mathcal {N}_r/ 2)$. It defines the net difference between the actual and optimal pickup/delivery times. This metric should be minimized.

In Table 3, we illustrate the relations between our secondary metrics and the primary metrics $K$ and $T$, as well as the mutual relations among secondary metrics. Except for the cost metrics $\eta $ and $\theta $ which combine both $K$ and $T$, the dependence of other metrics on $K$ results from the fact that we calculate per-vehicle averages. Moreover, although some metrics are related to each other, they carry interpretable, practical significance, as noted before. Besides, the average metrics are handy while performing the analysis of variability of route characteristics for a single solution. Therefore, we believe that all proposed metrics bring a full overview of the algorithm’s capabilities from the practical point of view.

Table 3 The relations between the proposed and primary quality metrics

Full size table

The metrics can be also combined into the following objective functions (1)–(2), which are minimized:

$$\begin{aligned} \mathcal {F}_1&= \omega _1\alpha '+ \omega _2\delta ' + \omega _3\varepsilon ' + \omega _4\eta '\nonumber \\&\quad - \omega _5\kappa ' - \omega _6\lambda '+ \omega _7\xi '+ \omega _8\pi ' + \omega _9\rho ', \end{aligned}$$

(1)

$$\begin{aligned} \mathcal {F}_2&= \omega _1\alpha ' + \omega _2\delta ' + \omega _3\varepsilon ' + \omega _4\eta ' + \omega _5/\kappa '\nonumber \\&\quad + \omega _6/\lambda '+ \omega _7\xi ' + \omega _8\pi ' + \omega _9\rho ', \end{aligned}$$

(2)

where $\omega _i \ge 0$, $1 \le i \le 9$, are weights such that $\sum _{i=1}^{9} \omega _i = 1$, and the parameters $\alpha '$, $\delta '$, $\varepsilon '$, $\eta '$, $\kappa '$, $\lambda '$, $\xi '$, $\pi '$, and $\rho '$ are the aforementioned metrics scaled to the range (0, 1], obtained by dividing each metric value by the maximum value obtained in a number of runs. The functions differ in the way metrics $\kappa $ and $\lambda $ are handled. Since these two metrics should be maximized, while objective functions $\mathcal {F}_1$ and $\mathcal {F}_2$ should be minimized, we propose to take their opposite or inverse values. Although the opposite values in $\mathcal {F}_1$ seem more natural, looking at metrics’ definitions (i.e., their dependence on the inverse of $K$), the objective $\mathcal {F}_2$ turns out to be more intuitive.

The functions have a few crucial properties. Firstly, they carry additional insights regarding the solution’s quality, which cannot be observed based solely on the primary metrics $K$ and $T$—note that within $\mathcal {F}_1$ and $\mathcal {F}_2$, only the metrics $\eta $, $\kappa $, and $\lambda $ depend on $K$ or $T$ (cf. Table 3). Secondly, the functions are quite universal with respect to the coordinate space in which the problem is located. Moving from the geographical coordinates to the Euclidean space, the only differences are that $\alpha = T$ and $\delta = \varepsilon $. However, these changes do not affect the practical significance of either $\mathcal {F}_1$ or $\mathcal {F}_2$. Also, they do not diminish the added value provided by these objectives with respect to the typical primary metrics. Finally, thanks to the weights $\omega _i$, the functions provide a great dose of flexibility as to the importance of respective metrics. This way, the functions can be adjusted to individual needs or a specific scenario.

2.3.2 Assessment of algorithms: statistical and non-functional analysis

Observing only the differences across various quality metrics is not enough without knowing whether these differences are significant in the statistical sense. Our framework employs the statistical analysis component, which verifies the statistical significance of the differences in $K$, $T$, $\tau $, and possibly other metrics. Although we use the Wilcoxon signed-rank test to confront two algorithms over a set of benchmark instances, any statistical test for matched results may be equally valid and applicable here. Additionally, we can effectively utilize this component to understand the convergence abilities of randomized algorithms, as we can conveniently run tests, e.g., Kruskal-Wallis with post-hoc Dunn’s, for repeated executions of the very same optimization approach. The advantage of incorporating the statistical analysis component into the framework is that the obtained p-values provide clear evidence behind the claims concerning the optimization performance of any investigated technique.

The above metrics allow to assess the quality of the solutions on the aggregated level, i.e., for the entire benchmark set encompassing a number of separate test instances. However, they often fail to provide some deeper insights that could be captured for particular instances. As an example, understanding how similar (or different) two VRP solutions with the same $K$ and comparable $T$ are. To answer this question, we propose a novel idea of using new graph similarity metrics.

Let $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ be two directed graphs representing the VRP solutions, with $V_1, V_2$ being the sets of labeled vertices, and $E_1, E_2$ denoting the sets of (weighted) edges. The edges are not explicitly labeled, but they acquire the labels resulting from the vertices they connect, i.e., the label of an edge connecting vertices $v_1, v_2$ is $(v_1, v_2)$. Let $L_1, L_2$ denote the sets of edge labels for graphs $G_1, G_2$, respectively. We propose the following similarity metrics (3), (4), and (5), based on counting the number of common walks:

$$\begin{aligned} \mathcal {K}_1 = \frac{|L_1 \cap L_2|}{|L_1 \cup L_2|}, \end{aligned}$$

(3)

$$\begin{aligned} \mathcal {K}_2 = \frac{\sum _{L_1 \cap L_2} 1 / (\left| o_1 - o_2\right| + 1)}{|L_1 \cup L_2|}, \end{aligned}$$

(4)

$$\begin{aligned} \mathcal {K}_3 = \frac{2 \cdot \sum _{L_1 \cap L_2} w_e}{\sum _{L_1} w_e+ \sum _{L_2} w_e}, \end{aligned}$$

(5)

where $o$ is the edge offset within a route, relative to the depot, $w_e$ denotes the weight of the corresponding edge, and $\mathcal {K}_1$ can be treated as the Jaccard index calculated for graphs.

The intuition behind these metrics is that the solution graphs G are given as unordered sets of routes of the form $v_1, w_1, v_2, \ldots , w_l, v_l$, where $v_i \in V$, $w_i$ are weights, $v_1 = v_l$ is the depot, and l is the route length. Consequently, to express the similarity between two such graphs, we need to look for the same route segments represented by the same labels.

The similarity metrics $\mathcal {K}_1$–$\mathcal {K}_3$ can be computed efficiently in $O(|E_1| + |E_2|)$ time. Their values are also easy to interpret. In particular, $\mathcal {K}= 1$ means that $G_1 \equiv G_2$, $\mathcal {K}= 0$ is obtained if $L_1 \cap L_2 = \emptyset $, and the closer the value to 1 becomes, the greater the similarity between $G_1$ and $G_2$ is obtained. The metric $\mathcal {K}_1$ counts the number of common edge labels in two graphs. However, it does not consider the relative offsets of the common edges in the graphs. On the other hand, $\mathcal {K}_2$ adjusts the similarity metric based on the offsets of edges relative to the depot. It gives preference (larger impact on the $\mathcal {K}_2$ value) to the edges that appear at the same offsets in the compared graphs. Finally, $\mathcal {K}_3$ puts the analysis into the real-world context through including the weights, being e.g., the distances. Overall, the three similarity metrics give us a comprehensive and quantifiable view of the distance between any pair of directed graphs that represent the solutions of any VRP.

Table 4 Our similarity metrics obtained for the solutions shown in Fig. 3

Full size table

Let us illustrate the similarity metrics’ performance with an example. In Fig. 3, we depict a couple of example solutions to a given VRP. For a human expert, it may look obvious that the initial three solutions look similar to each other, while the other two seem to be substantially different, however the precise quantification of such differences remains an open challenge. Additionally, for a large set of test instances, manual comparison is infeasible and tedious. To tackle this issue, we can calculate $\mathcal {K}_1$–$\mathcal {K}_3$ (Table 4). Consider, for example, the graphs $G_1$ and $G_3$. The set of labels for $G_1$ is (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 0), (0, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (12, 0), while the set of labels for $G_3$ is (0, 3), (3, 4), (4, 1), (1, 2), (2, 5), (5, 6), (6, 0), (0, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (12, 0). The union of these two sets is of size 17, while the intersection is of size 11. Hence, $\mathcal {K}_1 = 0.65$. The offsets of common labels for the two graphs are as follows: ((1, 2), 1, 3), ((3, 4), 3, 1), ((5, 6), 5, 5), ((6, 0), 6, 6), ((0, 7), 0, 0), ((7, 8), 1, 1), ((8, 9), 2, 2), ((9, 10), 3, 3), ((10, 11), 4, 4), ((11, 12), 5, 5), ((12, 0), 6, 6). Therefore, the numerator of $\mathcal {K}_2$ is 9.66, and so $\mathcal {K}_2 = 0.57$. Finally, the sum of weights for the common labels is 432 (note that we include each weight twice, once for each graph), while the sum of all weights for $G_1$ and $G_3$ is 558, and so $\mathcal {K}_3 = 0.77$.

In more general terms, based on Table 4, we can observe that $\mathcal {K}_1$ considers $G_2$ and $G_3$ to be equally similar to $G_1$. The $\mathcal {K}_2$ metric provides the best means for differentiation of $G_5$ from $G_1$–$G_3$. Finally, we can appreciate how the real-life route characteristics, being reflected in the weights associated with the edges affect the values of $\mathcal {K}_3$. Using the proposed similarity metrics, we obtain quantitative and objective information in a repeatable way. They also help us draw some conclusions upon the routing algorithms. If an algorithm produces very similar solutions every time it runs, we conclude it has good convergence and search space exploitation capabilities. On the other hand, if the solutions differ between runs, the exploration capabilities prevail.

To get some more insights into the proposed metrics, we performed the following experiment. We generated all possible two-route solutions to a PDP problem with 8 requests (4 pairs of pickups and deliveries). Then, we calculated $\mathcal {K}_1$–$\mathcal {K}_3$ for all obtained pairs of solutions and collected the frequencies of all unique similarity values for each metric—these frequencies are plotted in Fig. 4. Looking at the plots, we observe that the metrics differ in the amount of detail provided, expressed as the number of produced unique values—$\mathcal {K}_1$ gives only 10 unique values, while $\mathcal {K}_3$ generates over $38\cdot 10^3$ values. Moreover, although the frequencies of the border values (0.0 and 1.0) are the same for $\mathcal {K}_1$-$\mathcal {K}_3$, their contribution relative to the other values is significantly different depending on the similarity metric. Let us also note that if we were to compare the solutions using classic metrics like $K$ or $T$, we would gain much less useful information. On the one hand, the $K$-based comparison would simply treat all the solutions as being exactly the same. On the other hand, $T$-based comparison produces almost $24\cdot 10^3$ unique values (absolute differences between $T$’s of two solutions), but they are harder to interpret. For example, the value of 0.0 indicates that the graphs are considered the same, but for the graphs having completely different edges there is no specific value to be expected (as opposed to the value of 0.0 in the case of $\mathcal {K}_1$–$\mathcal {K}_3$). To be more precise, graphs having no common edges according to our metrics, produce differences in $T$ in the range [0.23, 168.75]—note that the overall range of values for this metric is [0.0, 178.49], so looking at the differences in $T$ does not help to tell whether two graphs are similar or have nothing in common.

2.3.3 Assessment of solutions and algorithms: visualization and exploration

Apart from the quantitative analysis, the framework also facilitates graphical analysis of the solutions. It exploits the KML files which can be conveniently visualized in a number of widely available tools, such as Google Earth. Using these artifacts, we can not only display the solution on a map, but also assess the solutions visually and compare them with the outcomes provided by human experts.

The visualization of obtained solutions in the framework comes in two flavors. The first one presents a complete solution, with the respective routes placed in separate folders. This way, the user can explore the solution easily by showing/hiding complete routes. The second visualization method involves generation of individual routes as separate KML files, to facilitate more interactive exploration. Using the presentation mode, the user can navigate between successive stops to fully understand (and validate) the architecture of the obtained route.

3 Experiments

Our study includes three experiments. In Experiment 1, we investigate the functional abilities of the entire validation pipeline through generating real-life PDPTW instances located in the Gliwice area, Poland, and solving them using selected state-of-the-art techniques (Section 3.2). Although we can use our BG to synthesize instances located in any map, we focused on the area that covers the bus stops utilized by the Blees’s^{Footnote 4} employees that travel to the office on a daily basis—this approach brings a real-life aspect into our experimentation. Experiment 2 walks us through the process of confronting selected algorithms over the known benchmark test instances, and shows how our methodology can be used to easily compare new and existing optimization techniques over widely-used and established test instances (Section 3.3). In both experiments, we take the PDPTW as an example of a rich VRP [55] (Section 3.1). Our validation approach is, however, independent of the underlying problem formulation and can be easily adjusted to handle other VRPs through updating the optimization objectives and constraints. Finally, Experiment 3 further proves the practical utility of our system over the sample data collected from the operating Demand Responsive Transport system.

3.1 Problem formulation

The PDPTW is defined on a directed graph $G=(V,E)$, with a set $V$ of $C = \left| \mathcal {V}_1\right| +\left| \mathcal {V}_2\right| +1$ vertices and a set of edges $E$. The vertices $v_i$, $i\in \{1,2,\dots ,\left| \mathcal {V}_1\right| +\left| \mathcal {V}_2\right| \}$, represent the locations of the requests, and $v_0$ indicates the location of the depot. A set of edges $E=\{(v_i,v_{i+1})|v_i,v_{i+1}\in V, v_i \ne v_{i+1}\}$ represents the travel links between particular passengers. The travel costs $c_{i,j}$, $i,j \in \{0,1,\dots ,C\}$, $i \ne j$, are equal to the distances (in the Euclidean metric) between the travel points. Each request $h_i$, $i\in \{0,1,\dots ,\mathcal {N}_r\}$ is a coupled pair of pickup ($\mathcal {V}_1$) and delivery ($\mathcal {V}_2$) passengers, with the pickup and delivery demands, service times ($s_i$), and time windows within which the corresponding pickup and delivery should start ($[e_i, l_i]$), where $\mathcal {V}_1\cup \mathcal {V}_2= V\setminus \{v_0\}$ and $\mathcal {V}_1\cap \mathcal {V}_2= \emptyset $. For each request $h_i$, the amounts of delivered ($q^d(h_i)$) and picked up ($q^p(h_i)$) goods are defined, where $q^d(h_i)=-q^p(h_i)$. The pickup must always occur before the corresponding delivery, and both must be served in the same route. The fleet of $K$ homogeneous vehicles with the capacity $\mathcal {C}$ serve the customers, and each route starts and finishes at the depot.

The PDPTW solution $\sigma $ is a set of routes, where each route r is $r=\left<v_0,v_{r(1)},v_{r(2)},\dots ,v_{n+1}\right>$, and it starts and finishes in the depot (therefore, we have $v_0 = v_{n+1}$).

3.1.1 Objectives

The PDPTW is commonly considered to be a two-objective discrete optimization problem, with the primary objective of minimizing the fleet size $K$, where $K\ge K_\textrm{min}$ ($K_\textrm{min} = \left\lceil D/\mathcal {C}\right\rceil $, and $\mathcal {D}=\sum _{i=1}^N{q^d(h_i)}$ denotes the total delivery demands that are to be served).

The second objective (6) is to minimize the total distance (T) traveled by all the vehicles serving the transportation requests:

$$\begin{aligned} T=\sum \limits _{i=0}^C { \sum \limits _{j=0}^C { \sum \limits _{k=1}^K {x_{(i,j,k)}c_{(i,j)}}}}. \end{aligned}$$

(6)

In the PDPTW, there are three decision variables (7), (8), and (9):

$$\begin{aligned} x_{(i,j,k)} (i,j \in \{0,1,\dots ,C\}, i \ne j, \forall k \in K), \end{aligned}$$

(7)

$$\begin{aligned} a_{i} (i \in \{0,1,\dots ,C\}),~\text {and} \end{aligned}$$

(8)

$$\begin{aligned} \hat{w} _{i} (i \in \{1,2,\dots ,C\}). \end{aligned}$$

(9)

If the k-th vehicle travels from $v_i$ to $v_j$, where $i\ne j$, then $x_{(i,j,k)}=1$ (it is 0 otherwise). The two other decision variables, $a_i$ and $\hat{w}_i$, indicate the arrival and the waiting times at $v_i$ (additionally, we can observe that $a_0 = e_0$).

Let $\sigma _A$ and $\sigma _B$ be two feasible PDPTW solutions. If we consider the two aforementioned objectives, then $\sigma _A$ is of a higher quality than $\sigma _B$, if $($ $K(\sigma _A) < K(\sigma _B)$ $)$ or $($ $K(\sigma _A) = K(\sigma _B)$ and $T(\sigma _A) < T(\sigma _B)$ $)$.

3.1.2 Constraints

The constraints (10), (11), (12), (13), (14), and (15) may be expressed as:

$$\begin{aligned} \sum \limits _{k=1}^K {\sum \limits _{j=0, j \ne i}^C {x_{(i,j,k)}} } = \sum \limits _{k=1}^K {\sum \limits _{i=0, j \ne i}^C {x_{(i,j,k)}} } = 1 \quad (\forall i, j \in V), \end{aligned}$$

(10)

$$\begin{aligned} \sum \limits _{j=1}^C { x_{(i,j,k)}} = \sum \limits _{j=1}^C { x_{(j,i,k)}} = 1 \quad (i = 0, \forall k \in K), \end{aligned}$$

(11)

$$\begin{aligned} \sum \limits _{j=1}^C { \sum \limits _{k=1}^K {x_{(i,j,k)} = K}} \quad (i = 0), \end{aligned}$$

(12)

$$\begin{aligned} \sum \limits _{i=1}^C { q^d(h_i) \sum \limits _{j=0, j \ne i}^C {x_{(i,j,k)} \le \mathcal {C}}} \quad (\forall k \in K), \end{aligned}$$

(13)

$$\begin{aligned} e_i \le a_i + \hat{w}_i \le l_i \quad (\forall i \in V),~\text {and} \end{aligned}$$

(14)

$$\begin{aligned} \max \{ a_{i} + s_{i} + c_{i,j}, e_j\} \le l_j, \end{aligned}$$

(15)

for all pairs (i, j) such that $i \ne j$, and $x_{(i,j,k)}=1$, for a given k. Every customer is visited exactly once (10), all routes start and finish at the depot (11), and the fleet size is equal to K (12). Also, if a truck arrives at a certain travel point, then it departs from the same point. The capacity (13) and the time window constraints (14)–(15) must hold for each route as well. Therefore, the total amount of goods being delivered cannot exceed $\mathcal {C}$, and the service of each customer must be started before its time window closes.

3.2 Experiment 1: end-to-end processing

To execute the complete pipeline, we exploited the sets of real-life pickup and delivery locations ($\mathcal {V}_1$ and $\mathcal {V}_2$), being the bus stops, alongside the pickup points of the Blees’s employees located in the Gliwice area, Poland (they were provided via a KML map file). The benchmark instances (25 in total^{Footnote 5}) were generated using BG with the following parameters controlling the difficulty of the tests: $\mathcal {N}_r= 100$, $\mathcal {N}_{vr}= 10$, $\mathcal {C}= 200$, $\mathcal {W}= 0.1$, $\mathcal {T}= 90$, $s= 50$, and $r= 25$ (see Section 2.2).

The solver module included ISG, one RM algorithm and two DM algorithms. Each component was run 10 times for no more than two minutes per execution. The solutions produced by each technique constituted the initial population for the next component in the pipeline. First, ISG assigned each pickup-delivery pair to a separate vehicle (generating “virtual” vehicles), and then reduced the number of vehicles until the fleet size was obtained. The RM algorithm employed our ejection-based approach, in which requests are ejected from a randomly selected route, and then feasibly reinserted into other routes [56]. Finally, DM involved a local search (LS) algorithm focusing on the in/out relocate/exchange moves that were iteratively employed to minimize the total travel distance of a feasible solution, and our memetic algorithm (MA), being a hybrid evolutionary technique which exploits local search moves to intensify the optimization in the most promising parts of the search space [55]. The RM and DM approaches continued their execution until the time limit was reached. Note that the parameterization of the solvers was not optimized, and we kept the hyperparameter values as suggested in the corresponding papers in all experiments.

In Fig. 5, we depict the primary metrics ($K$, $T$ [in km], and $\tau $ [in seconds]), alongside our secondary metrics obtained using all investigated techniques (see the detailed results in the supplement available at https://gitlab.com/jnalepa/standardized_vrp). As expected, the main drop in $K$ is observed between ISG and RM, without further changes in the DM component, as it focuses on minimizing $T$. Although the distance $T$ may effectively decrease in both RM and DM, its change in the RM component results from the corresponding change in $K$. Note that this change is also reflected in the $T$, $\alpha $, $\gamma $, and $\eta $ metrics. Finally, we can appreciate that ISG consistently converges very fast, while LS and MA are far more stable than RM in this respect.

Table 5 The ranking values obtained using the investigated algorithms for all secondary metrics, averaged across all the generated tests

Full size table

The secondary metrics presented in Fig. 5 bear a lot of practical significance (the ranking values^{Footnote 6} of all methods obtained over all secondary metrics are given in Table 5; here we do not report the ranking values for $\mathcal {F}_1$ and $\mathcal {F}_2$ for brevity). The vehicles round-trip time $\alpha $ indicates the number of drivers needed to handle a day-long routing plan, whereas $\delta $ and $\varepsilon $ show whether the passengers are transported directly to their destinations or have to visit some other stops on their way. The total cost $\eta $ encompasses the costs resulting from each vehicle (e.g., drivers salary or vehicle insurance) and traveled distance (e.g., fuel)—in Fig. 5, unit costs were assumed, but they can be customized according to one’s needs. Finally, $\lambda $ shows the used capacity of the vehicles, and its maximization can lead to better utilization of the available fleet. Such secondary metrics not only shed more light on practical characteristics of the solutions, but allow to consistently pick the algorithm that outperforms the others according to the preferred measures (Table 5). It is of note that the convergence times of the specific modules exploited in the current version of the optimization framework vary significantly. For the ISG module, the run time was always below a second, for the RM module it ranged from about 13 s up to 90 s, for the DM (LS) module it went from 105 s to 112 s, while for the MA module it could be as short as 65 s and as long as 108 s. Let us note, however, that the modules are replaceable, so the convergence times, and consequently the running times of the complete execution pipeline may vary from the ones reported here, depending on the applied algorithmic solutions. Since the modules were executed sequentially, the total convergence time of the algorithm (understood as the complete framework of four modules) ranged from 201 s to 305 s (with the average of 263 s and the standard deviation of 26.5 s).

With equal weights, the combined objective functions $\mathcal {F}_1$ and $\mathcal {F}_2$ behave similarly to each other with the difference in magnitude resulting from the varying treatment of the $\kappa $ and $\lambda $ metrics. However, due to the flexibility of $\mathcal {F}_1$ and $\mathcal {F}_2$ resulting from the weights, and also thanks to the customization of the $\eta $’s parameters, we can model different real-life scenarios. For instance, let us assume that we are interested in passengers’ satisfaction measured by the smallest waiting time and that the passengers are less willing to be late for the dropoff point than at the pickup. We can then assume the following weights $\omega _7 = 0.5$, $\omega _8 = 0.25$, $\omega _9 = 0.25$, and $w_i = 0$ otherwise. One of the benchmarks reports the same $K$ for all 10 runs with the difference in $T$ between the best and worst solution being almost 5.7 km. However, the smallest waiting time (i.e., the best objective function value) is achieved for the largest $T$. Therefore, in this scenario, the passengers can accept a longer route to reduce their waiting time and they will pick a different solution than the one based on metrics $K$ and $T$.

In Fig. 6, we present the distribution of values reported by $\mathcal {K}_1$–$\mathcal {K}_3$ for our benchmarks. Note that the ISG component consistently produces the same solution (all metrics report perfect similarity). Furthermore, the solutions produced by other components differ significantly from the initial ones (as indicated by low similarities for pairs ISG/RM, ISG/LS, and ISG/MA). We observe that LS and MA tend to produce similar results, which indicates their good convergence capabilities.

A set of example visualizations (based on the Google Earth engine and KML files generated by the post-processing module) of the obtained solutions (for a single test instance) is rendered in Fig. 7. Such visualizations help us not only better understand the layout of the final solution, but can also allow us to manually fine-tune the obtained routing schedules if necessary, perhaps in some specific parts of the map.

Finally, our validation approach enables us to perform the statistical analysis to see whether the observed differences in primary (or secondary) metrics are significant. Since $K$ for RM, LS, and MA are tied, the p-values could not be calculated (Table 6). For other cases, we can observe that employing different algorithms lead to obtaining significantly different final solutions and convergence abilities (at $p<0.01$). The two-step optimization methodology used in this work toward solving the rich VRPs is clearly better than the single-objective optimization algorithms in which either the number of routes K or the total distance T is optimized, since they usually ignore the other metrics. The flexibility of our framework allows us to easily build such optimization cascades (and replace specific algorithmic components, e.g., the algorithm for minimizing the number of routes, the traveled distance, or both). Since the RM and DM modules focus on optimizing different aspects, their combined performance provides us with solutions that are competitive with respect to the state-of-the-art ones. Overall, this experiment shows that the artifacts generated by all framework components can be thoroughly analyzed to give us a detailed overview of the algorithms’ behavior and capabilities.

Table 6 The p-values obtained using the Wilcoxon signed-rank test

Full size table

3.3 Experiment 2: comparing algorithms

In this experiment, we confront the aforementioned DM algorithms and compare them in more detail over a widely-used Li and Lim’s PDPTW benchmark. Here, our aim is to present the flexibility of the proposed methodology, in which we can exploit existing benchmarks that have been already utilized in the literature to compare emerging solvers. Hence, from the scientific point of view, we can conveniently extract the quality measures obtained over the known data to directly compare them with those reported in other papers. We used 60 instances of the 200-request tests with clustered, randomized, and mixed locations (C, R, and RC groups, respectively) with small vehicle capacities and short TWs (C1, R1, and RC1), and with larger capacities and wider TWs (C2, R2, RC2)—in each group, we have ten distinct test cases^{Footnote 7}. Both LS and MA algorithms were supplied with the same initial population composed of 10 solutions generated beforehand by the RM component. The collected $K$, $T$, and $\tau $ metrics for this experiment are shown in Fig. 8. The distribution of $K$ and $T$ for both algorithms is similar, and the Wilcoxon signed-rank test confirms that there are no significant differences in these two metrics (the p-value amounts to 0.12).

Table 7 The ranking values obtained using LS and MA for all secondary metrics, averaged across all Li and Lim’s instances

Full size table

Table 8 Results of correlation analysis for Li and Lim’s benchmarks, expressed as percentages of all analyzed cases within each group

Full size table

The validation framework enables us to aggregate the artifacts at various levels—we render the comparison of $K$’s and $T$’s in Fig. 9 for each Li and Lim’s group separately. Also, it shows the best known results available in the literature (WB)—here, it is important to note that the WB schedules were obtained using various algorithms (i.e., not a single algorithm), hence should be considered as the current (known) quality upper bound of $K$ and $T$. Both LS and MA provide similar $K$’s, regardless of tests’ characteristics. On the other hand, the results are more diverse in $T$, especially for MA and C2 (compared with C1), R2 (vs. R1), and RC2 (vs. RC1) which indicates that some algorithms are more efficient in tackling specific VRP instances, and converge in such situations much better. Hence, knowing their characteristics beforehand may ultimately help us select the best-suited optimization technique in either fully- or semi-automated way, perhaps exploiting additional expert knowledge concerning e.g., other features of the transportation ecosystem.

In Fig. 10, we present the results of the transportation system profitability analysis for the Li and Lim’s benchmarks. We evaluate the profitability function $\mathcal {P}= c_{\text{ p }}\cdot \lambda - \theta $, where $c_{\text{ p }}$ is the value introduced by a single passenger (e.g., the price of a ticket). Note that for unit values of $c_{K}$ and $c_{T}$, we always get a positive value of $\mathcal {P}$ (denoting the obtained profits). For larger $c_{K}$’s, the transport becomes profitable, regardless of the benchmark type, only when $c_{\text{ p }}\ge 8$. The profitability analysis, besides its practical significance, allows us also to differentiate between various problem types forming clusters of similar $\mathcal {P}$ values for related problem instances. Finally, Table 7 gathers the ranking values obtained for all secondary metrics averaged over all problem instances in the Li and Lim’s groups. This analysis helps us select the algorithm according to the most important optimization criteria—as an example, if we consider $\zeta $, $\eta $ or $\pi $ to be more important factors of our ITS, we should pick MA as our algorithm of choice, because it leads to better solutions in these quality metrics. Note that in this example we confront LS and MA (two example solvers)—we do not include WB here, as WB is actually the collection of various algorithms, not a single optimization technique (as mentioned earlier).

The results obtained for the Li and Lim’s benchmarks allow us to observe the significance of the proposed secondary metrics. Firstly, assuming $\varepsilon $, $\delta $, and $\rho $ to be the meters of passengers’ satisfaction, we note that even for greater $K$, we could see big improvements in these metrics (e.g., for the lc1_2_10 benchmark we got a reduction of 3300 s, 1075 m, or 33 s for $\varepsilon $, $\delta $, and $\rho $, respectively). Secondly, given two solutions with ($K_1$, $T_1$) and ($K_2$, $T_2$), such that $K_1 = K_2$ and $T_1 \sim T_2$, we often observe large differences in $\varepsilon $, $\delta $, and $\rho $. For example, for the lc2_2_2 benchmark, a difference in $T\approx 3$ m corresponded to over $-8200$ s difference in $\varepsilon $ or over $-3700$ m in $\delta $. Similar observations apply also to other groups, e.g., for the lr2_2_9 benchmark an increase in $T$ by less than 0.5 m corresponds to a reduction of $-1170$ m in $\delta $.

To get a better view on the correlation between the primary and secondary metrics, we analyzed how a change in $T$ (for the same value of $K$) is reflected in the changes of $\rho $, $\delta $, and $\varepsilon $, being the proposed quality metrics that may reflect the passengers’ satisfaction. The results, collected in Table 8, show that the directly proportional changes of the secondary metrics with respect to changes in $T$ constitute 44.7%–71.8% (55.7% on average) of cases for LS, and 46.5%–74.2% (58.8% on average) of cases for MA. Note also that for LS, the share of directly proportional changes is always greater in the C2, R2, and RC2 groups as compared to the other ones. We observe a similar trend for MA, except for the RC groups. Finally, we note that certain portion of the analyzed cases, both for LS and MA, turned out to be indifferent to the changes in $T$, as indicated in the columns marked with ‘−’ in Table 8.

In Fig. 11, we gather the values obtained using our similarity metrics $\mathcal {K}_1$–$\mathcal {K}_3$ over all Li and Lim’s groups. These distributions show, on the one hand, the convergence capabilities of the algorithms for test instances of specific characteristics (Init., LS/LS, and MA/MA). Calculating the similarities for the pairs of solutions obtained using different techniques (LS and MA vs. Init., and LS/MA) objectively indicates how much different the solutions elaborated to the corresponding test cases are. We can appreciate that $\mathcal {K}$’s are consistently largest for LS/LS, MA/MA, and LS/MA which means that the optimization leads to similar solutions for all groups. Interestingly, $\mathcal {K}$’s increase for LS and MA when compared to the initial population which indicates that the optimization leads to a known part of the solution space, and the elaborated schedules seem to be not far from some initial solutions. To further verify the abilities of the proposed similarity metrics, we ran classic random walk and labeled random walk graph kernels in this scenario [75]. They were not only unable to capture the graph similarities (consistently reporting 1.0, hence perfect similarity) but they ran much slower—it took 36.20 s on average to calculate kernels for all solutions and all variants in Fig. 11 in the case of the vanilla random walk (the labeled random walks tend to timeout here and ran for more than 60 min for each pair of schedules), whereas it was only 0.11 s, 0.23 s, and 0.16 s on average for our metrics $\mathcal {K}_1$–$\mathcal {K}_3$. The classic kernels suffer from high time complexity, with the labeled version being also very memory-intensive. Furthermore, these kernels do not consider the characteristics of the solution graphs such as the fact that the walks should start and finish only at the depot, or they are finite. Finally, the labeled kernel fails to report zero similarity for a pair of graphs having no common labels.

3.4 Experiment 3: real-life use case

To further prove the practical utility of our system, we collected sample data from an existing Demand Responsive Transport (DRT) system operated by Blees in the Northwestern part of Poland. To this end, we constructed a problem instance with only two vehicles able to carry 8 passengers at a time, which were to handle a set of 60 requests (30 pickup-delivery pairs) spread over a whole working day starting at 7 a.m. and ending at 7 p.m. The service time at each stop was 60 seconds and the time windows spanned from 2 to 8 minutes. The hourly distribution of the number of requests is shown in Fig. 12. It can be observed that the peak hours occurred at 10 a.m. and 4 p.m., with the least popular hours being 9 a.m., 11 a.m., and 3 p.m.

Given this problem instance, we were able to solve and evaluate the obtained solutions using the proposed framework. Some of the most important findings are as follows:

1.
The ISG module produced 10 unique solutions, with the value of metric T ranging from 323183.4 m to 366984.9 m, while in all the remaining modules, each run produced the same solution.
2.
The final best solution (shown in Fig. 13), obtained using the DM module (the solution was not improved by the MA module) reported the T metric equal to 313282.1 m, which is an improvement of around 3.1% with respect to the best solution produced by the ISG module.
3.
To satisfy all the requests within their time windows, two vehicles were needed. This is consistent with the outcomes recorded by the DRT system, operating a single vehicle, in which a number of delayed pickups/deliveries were observed.

4 Conclusions and future work

Verification and validation of existing and emerging algorithms for tackling rich VRPs in ITSes trigger various practical challenges. We make a step toward establishing an adopted way of thorough examination of VRP solvers, and propose an approach that allows us to not only tackle any rich VRPs, but also to generate benchmark tests that are proven to be solvable. We introduced novel graph similarity metrics and showed that they can be effectively utilized to quantify the distance across the obtained solutions, hence to understand the convergence capabilities of the investigated algorithms. Finally, our secondary quality metrics that couple the standard measures commonly used to assess the routing schedules highlight the practical characteristics of the solutions, and allow to perform real-life investigations, such as the profitability analysis of the ITS. To experimentally show the abilities of the proposed techniques, we focused on the pickup and delivery problem with time windows, and executed a thorough experimental study, effectively split into three experiments. In the first experiment, we investigated the functional abilities of the entire validation pipeline proposed in this paper, and generated real-life PDPTW test instances in the Gliwice area, Poland, which were later solved using selected algorithms from the literature. Afterwards, in the second experiment, we showed how to objectively compare the algorithms over the well-established benchmark test instances in an unbiased and fair way. Finally, the third experiment focused on proving the practical utility of our system through deploying it over the sample data collected from the existing Demand Responsive Transport system, currently operating in the Northwestern part of Poland. Although our framework can be easily adjusted to other VRPs, we showed that the artifacts generated at different processing steps of our validation chain enable practitioners to execute quantitative, qualitative, and statistical analysis of the algorithms in the context of constrained pickup and delivery problems. We believe that the proposed validation process is a comprehensive comparison protocol that could help us perform reproducible, fully quantifiable and traceable research in the area of smart delivery systems and ITSes, and it will ultimately make following good research practices easier and straightforward. Effectively, we believe that our approach will be an important step toward combating the reproducibility crisis in the artificial intelligence research.

The research reported in this manuscript constitutes an interesting departure point for future developments. Solving rich VRPs is an extremely vital field, and a number of efficient algorithms for tackling various formulations of such transportation problems emerge monthly—it would certainly be interesting to experimentally prove the flexibility of the proposed validation framework in a variety of intelligent algorithms and variants of VRPs. Additionally, as selecting the appropriate parameter values in transportation solvers may play a key role in obtaining their high-quality operation, incorporating this step into our benchmarking approach for the best-performing state-of-the-art algorithms could help enhance their abilities even further. This would help us solidify the design of the framework through testing the interfaces between its components, as such components would correspond to specific scenarios (i.e., the algorithm, problem formulation, and test data tuples). Also, parallel and distributed algorithms have been blooming in the field, leading to the world’s best routing schedules in e.g., VRPTW and PDPTW, as reported on the SINTEF website for the widely-used benchmarks^{Footnote 8}. Investigating such techniques in our framework would be an exciting research pathway that could indeed shed more light on their practical aspects, as deploying them in hardware-constrained environments (e.g., without dozens of parallel processors that can effectively cooperate) may become an obstacle in their fast adoption in industry. We believe that deploying our framework in practical transportation scenarios can help practitioners make more informed and robust decisions concerning the design of their intelligent transportation systems.

Data Availability

The datasets generated during and/or analyzed during the current study are available in the supplementary material available at https://gitlab.com/jnalepa/standardized_vrp. The Li and Lim’s benchmark problem instances analyzed in this work are available at https://www.sintef.no/projectweb/top/pdptw/li-lim-benchmark/.

Notes

Imagine selecting a parallel evolutionary solver that had been executed on 50 parallel processors to elaborate the best solutions of e.g., the Li and Lim’s benchmark, and running it on an embedded low-cost CPU deployed in an autonomous vehicle. Although we should pick this technique as the preferred solution according to the primary quality metrics (as it smashed all other solvers and delivered the world’s best routing schedules for a set of benchmark test cases), we would have omitted the practical aspect of its target implementation, hence the software would have been unusable. Therefore, exploiting just the number of routes and the total distance traveled may easily lead to losing the big picture of our ITS. We address this issue in this paper.
The actual number of requests per vehicle may vary by $\pm s\%$ with respect to the suggested value, where $s$ is a hyperparameter of BG.
For readability, we assume that each pickup-delivery pair reflects a passenger traveling from and to the corresponding locations.
Blees is a Smart Mobility company—for more details, see https://blees.co/en/.
The maps of pickup and delivery points, tests and example solutions are available at https://gitlab.com/jnalepa/standardized_vrp.
All secondary metrics, obtained for all tests belonging to our benchmark set and the Li and Lim’s benchmark suite are included in the supplement at https://gitlab.com/jnalepa/standardized_vrp.
The Li & Lim tests and the world’s best solutions are available at https://www.sintef.no/projectweb/top/pdptw/li-lim-benchmark/documentation/.
https://www.sintef.no/projectweb/top/

References

Altabeeb AM, Mohsen AM, Abualigah L, Ghallab A (2021) Solving capacitated vehicle routing problem using cooperative firefly algorithm. Appl Soft Comput 108:107403
Google Scholar
Bi J, Yuan H, Duanmu S, Zhou M, Abusorrah A (2021) Energy-optimized partial computation offloading in mobile-edge computing with genetic simulated-annealing-based particle swarm optimization. IEEE Int Things J 8:3774–3785
Google Scholar
Blocho M (2020) Exact algorithms for solving rich VRPs. In: Nalepa J (ed) smart delivery systems. Solving complex vehicle routing problems. Elsevier, Intelligent Data Centric Systems, pp 93–100
Blocho M (2020) Heuristics, metaheuristics, and hyperheuristics for rich vehicle routing problems. In: Nalepa J (ed) smart delivery systems. solving complex vehicle routing problems. Elsevier, Intelligent Data Centric Systems, pp 101–156
Blocho M, Jastrzab T, Nalepa J (2022) Cooperative co-evolutionary memetic algorithm for pickup and delivery problem with time windows, In: Proceedings of the genetic and evolutionary computation conference companion, Association for Computing Machinery, New York, USA. pp 176-179. https://doi.org/10.1145/3520304.3528782
Blocho M, Jastrzab T, Nalepa J (2023) Parallel cooperative memetic co-evolution for VRPTW. In: Silva S, Paquete L (eds) Companion proceedings of the conference on genetic and evolutionary computation, GECCO 2023. Companion Volume, Lisbon, Portugal, July 15-19, 2023, ACM. pp 53–54. https://doi.org/10.1145/3583133.3596436
Blocho M, Nalepa J (2017) LCS-based selective route exchange crossover for the pickup and delivery problem with time windows, In: Evolutionary computation in combinatorial optimization, Springer. pp 124–140
Cao Z, Lin C, Zhou M (2021) A knowledge-based cuckoo search algorithm to schedule a flexible job shop with sequencing flexibility. IEEE Trans on Autom Sci Eng 18:56–69
Google Scholar
Cavalcanti Costa JG, Mei Y, Zhang M (2023) Learning to select initialisation heuristic for vehicle routing problems, In: Proceedings of the genetic and evolutionary computation conference, Association for Computing Machinery, New York, USA. pp 266-274. https://doi.org/10.1145/3583131.3590397
Chen C, Demir E, Huang Y (2021) An adaptive large neighborhood search heuristic for the vehicle routing problem with time windows and delivery robots. Eur J Oper Res 294:1164–1180
MathSciNet Google Scholar
Dong W, Zhou M (2017) A supervised learning and control method to improve particle swarm optimization algorithms. IEEE Trans Syst Man Cybern Syst 47:1135–1148
Google Scholar
Fan H, Zhang Y, Tian P, Lv Y, Fan H (2021) Time-dependent multi-depot green vehicle routing problem with time windows considering temporal-spatial distance. Comput Oper Res 129:105211
MathSciNet Google Scholar
Fan L, Liu C, Dai B, Li J, Wu Z, Guo Y (2023) Electric vehicle routing problem considering energy differences of charging stations. Journal of Cleaner Production 418:138184. https://doi.org/10.1016/j.jclepro.2023.138184 https://www.sciencedirect.com/science/article/pii/S0959652623023429
Feng B, Wei L (2023) An improved multi-directional local search algorithm for vehicle routing problem with time windows and route balance. Appl Intell 53:11786–11798. https://doi.org/10.1007/s10489-022-04061-7
Article Google Scholar
Feng L, Zhou L, Gupta A, Zhong J, Zhu Z, Tan KC, Qin K (2019) Solving generalized vehicle routing problem with occasional drivers via evolutionary multitasking. IEEE Trans Cybern 1–14. https://doi.org/10.1109/TCYB.2019.2955599
Feng Y, Zhou M, Tian G, Li Z, Zhang Z, Zhang Q, Tan J (2019) Target disassembly sequencing and scheme evaluation for CNC machine tools using improved multiobjective ant colony algorithm and fuzzy integral. IEEE Trans Syst Man Cybern: Syst 49:2438–2451
Google Scholar
Fernández Gil A, Lalla-Ruiz E, Gómez Sánchez M, Castro C (2023) The cumulative vehicle routing problem with time windows: models and algorithm. Annals Oper Res. https://doi.org/10.1007/s10479-022-05102-7
Gao S, Zhou M, Wang Y, Cheng J, Yachi H, Wang J (2019) Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction. IEEE Trans Neural Netw Learn Syst 30:601–614
PubMed Google Scholar
Gao Y, Wu H, Wang W (2023) A hybrid ant colony optimization with fireworks algorithm to solve capacitated vehicle routing problem. Appl Intell 53:7326–7342. https://doi.org/10.1007/s10489-022-03912-7
Article Google Scholar
Gmira M, Gendreau M, Lodi A, Potvin JY (2021) Tabu search for the time-dependent vehicle routing problem with time windows on a road network. Eur J Oper Res 288:129–140
MathSciNet Google Scholar
Goeke D (2019) Granular tabu search for the pickup and delivery problem with time windows and electric vehicles. Eur J Oper Res 276:821–836. https://doi.org/10.1016/j.ejor.2019.05.010
Article MathSciNet Google Scholar
Goel R, Maini R (2018) A hybrid of ant colony and firefly algorithms (HAFA) for solving vehicle routing problems. J Comput Sci 25:28–37. https://doi.org/10.1016/j.jocs.2017.12.012
Article MathSciNet ADS Google Scholar
Goel RK, Rani Bansal S (2020) Hybrid algorithms for rich vehicle routing problems: a survey. In: Nalepa J (ed) Smart delivery systems. Elsevier, Intelligent Data-Centric Systems, pp 157–184
Han X (2023) Path planning algorithm for the multiple depot vehicle routing problem based on parallel clustering. Sci Program 2023:7588595. https://doi.org/10.1155/2023/7588595
Article Google Scholar
Herath H, Mittal M (2022) Adoption of artificial intelligence in smart cities: a comprehensive review. Int J Inf Manag Data Insights 2:100076. https://doi.org/10.1016/j.jjimei.2022.100076 https://www.sciencedirect.com/science/article/pii/S2667096822000192
Hou Y, Wu N, Zhou M, Li Z (2017) Pareto-optimization for scheduling of crude oil operations in refinery via genetic algorithm. IEEE Trans Syst Man Cybern Syst 47:517–530
Google Scholar
Ilhan I (2021) An improved simulated annealing algorithm with crossover operator for capacitated vehicle routing problem. Swarm Evol Comput 64:100911
Google Scholar
Jastrzab T, Buchcik A (2020) Practical applications of smart delivery systems. In: Nalepa J (ed) Smart delivery systems. solving complex vehicle routing problems. Elsevier, Intelligent Data Centric Systems, pp 249–268
Jastrzab T, Myller M, Tulczyjew L, Blocho M, Ryczko W, Kawulok M, Nalepa J (2022) Particle swarm optimization configures the route minimization algorithm. In: Groen D, de Mulatier C, Paszynski M, Krzhizhanovskaya VV, Dongarra JJ, Sloot PMA (eds) Computational Science - ICCS 2022. Springer International Publishing, Cham, pp 80–87
Google Scholar
Jia H, Miao H, Tian G, Zhou M, Feng Y, Li Z, Li J (2020) Multiobjective bike repositioning in bike-sharing systems via a modified artificial bee colony algorithm. IEEE Trans Autom Sci Eng 17:909–920
Google Scholar
Jiang G, Lam SK, Ning F, He P, Xie J (2020) Peak-hour vehicle routing for first-mile transportation: problem formulation and algorithms. IEEE Trans Intell Transp Syst 21:3308–3321
Jin X, Qin H, Zhang Z, Zhou M, Wang J (2021) Planning of garbage collection service: an arc-routing problem with time-dependent penalty cost. IEEE Trans Intell Transp Syst 22:2692–2705
Kalatzantonakis P, Sifaleras A, Samaras N (2023) A reinforcement learning-variable neighborhood search method for the capacitated vehicle routing problem. Expert Sys Appl 213:118812. https://doi.org/10.1016/j.eswa.2022.118812 https://www.sciencedirect.com/science/article/pii/S0957417422018309
Kapoor S, Narayanan A (2022) Leakage and the Reproducibility Crisis in ML-based Science. arXiv:2207.07048, 10.48550/ARXIV.2207.07048
Keskin M, Çatay B, Laporte G (2021) A simulation-based heuristic for the electric vehicle routing problem with time windows and stochastic waiting times at recharging stations. Comput Oper Res 125:105060
MathSciNet Google Scholar
Khan AH, Cao X, Li S, Katsikis VN, Liao L (2020) Bas-adam: an adam based approach to improve the performance of beetle antennae search optimizer. IEEE/CAA Journal of Automatica Sinica 7:461–471
Google Scholar
Kim G, Ong Y, Heng CK, Tan PS, Zhang NA (2015) City vehicle routing problem (City VRP): a review. IEEE Trans Intell Transp Syst 16:1654–1666. https://doi.org/10.1109/TITS.2015.2395536
Konstantakopoulos G, Gayialis S, Kechagias E (2020) Vehicle routing problem and related algorithms for logistics distribution: a literature review and classification. Oper Res. https://doi.org/10.1007/s12351-020-00600-7
Article Google Scholar
Lai D, Demirag O, Leung J (2016) A tabu search heuristic for the heterogeneous vehicle routing problem on a multigraph. Transp Res Part E 86:32–52
Google Scholar
Lespay H, Suchan K (2021) A case study of consistent vehicle routing problem with time windows. Int Trans Oper Res 28:1135–1163
MathSciNet Google Scholar
Li H, Li Z, Cao L, Wang R, Ren M (2020) Research on optimization of electric vehicle routing problem with time window. IEEE Access 8. https://doi.org/10.1109/ACCESS.2020.3014638
Lin B, Ghaddar B, Nathwani J (2021) Deep reinforcement learning for the electric vehicle routing problem with time windows. IEEE Trans Intell Transp Syst 1–11. https://doi.org/10.1109/TITS.2021.3105232
Liu J, Feng S, Niu Q, Li L (2019) New construction heuristic algorithm for solving the vehicle routing problem with time windows. IET Collab Intell Manuf 1:90–96. https://doi.org/10.1049/iet-cim.2019.0035
Article Google Scholar
Liu X, Wang D, Yin Y, Cheng T (2023) Robust optimization for the electric vehicle pickup and delivery problem with time windows and uncertain demands. Comput Oper Res 151:106119. https://doi.org/10.1016/j.cor.2022.106119 https://www.sciencedirect.com/science/article/pii/S0305054822003495
Lorenzo PR, Nalepa J, Kawulok M, Ramos LS, Pastor JR (2017) Particle swarm optimization for hyper-parameter selection in deep neural networks. Proc. GECCO, ACM, USA, pp 481–488
Google Scholar
López-Ibáñez M, Dubois-Lacoste J, Pérez Cáceres L, Birattari M, Stützle T (2016) The irace package: iterated racing for automatic algorithm configuration. Oper Res Perspect. 3:43–58
Lv Z, Shang W (2023) Impacts of intelligent transportation systems on energy conservation and emission reduction of transport systems: a comprehensive review. Green Technologies and Sustainability 1:100002. https://doi.org/10.1016/j.grets.2022.100002 https://www.sciencedirect.com/science/article/pii/S2949736122000021
Moghdani R, Salimifard K, Demir E, Benyettou A (2021) The green vehicle routing problem: a systematic literature review. J Clean Prod 279:123691
Mohamed E, Ndiaye M (2018) Optimal routing and scheduling in e-commerce logistics using crowdsourcing strategies, In: 7th International conference on industrial technology and management, IEEE. pp 248–253
Mor A, Speranza MG (2020) Vehicle routing problems over time: a survey. 4OR 18:129–149
MathSciNet Google Scholar
Muniasamy RP, Singh S, Nasre R, Narayanaswamy N (2023) Effective parallelization of the vehicle routing problem, In: Proceedings of the genetic and evolutionary computation conference, Association for Computing Machinery, New York, USA. pp 1036-1044. https://doi.org/10.1145/3583131.3590458
Máximo VR, Nascimento MC (2021) A hybrid adaptive iterated local search with diversification control to the capacitated vehicle routing problem. Eur J Oper Res 294:1108–1119
MathSciNet Google Scholar
Nalepa J, Blocho M (2015) Co-operation in the parallel memetic algorithm. Int J Parallel Program 43:812–839. https://doi.org/10.1007/s10766-014-0343-4
Article Google Scholar
Nalepa J, Blocho M, (2015b) A parallel algorithm with the search space partition for the pickup and delivery with time windows, In: 10th international conference on P2P, parallel, grid, cloud and internet computing, IEEE. pp 92–99. https://doi.org/10.1109/3PGCIC.2015.12
Nalepa J, Blocho M (2016) Adaptive memetic algorithm for minimizing distance in the vehicle routing problem with time windows. Soft Comput 20:2309–2327
Google Scholar
Nalepa J, Blocho M (2017) Adaptive guided ejection search for pickup and delivery with time windows. J Intell Fuzzy Syst 32:1547–1559
Google Scholar
Nalepa J, Blocho M, (2017b) A parallel memetic algorithm for the pickup and delivery problem with time windows, In: 25th Euromicro international conference on parallel, distributed and network-based processing, IEEE. pp 1–8. https://doi.org/10.1109/PDP.2017.75
Nalepa J, Blocho M (2018) Adaptive cooperation in parallel memetic algorithms for rich vehicle routing problems. Int J Grid Util Comput 9:179–192
Google Scholar
Nalepa J, Czech Z, (2013) New selection schemes in a memetic algorithm for the vehicle routing problem with time windows, In: Adaptive and natural computing algorithms, Springer. pp 396–405
Osaba E, Yang X, Fister I Jr, Del Ser J, Lopez-Garcia P, Vazquez-Pardavila A (2019) A discrete and improved bat algorithm for solving a medical goods distribution problem with pharmacological waste collection. Swarm Evol Comput 44:273–286. https://doi.org/10.1016/j.swevo.2018.04.001
Article Google Scholar
Osorio-Mora A, Escobar JW, Toth P (2023) An iterated local search algorithm for latency vehicle routing problems with multiple depots. Comput Oper Res 158:106293. https://doi.org/10.1016/j.cor.2023.106293 (https://www.sciencedirect.com/science/article/pii/S0305054823001570)
Article MathSciNet Google Scholar
Oumachtaq A, Ouzizi L, Douimi M (2023) Green vehicle routing problem (gvrp): State-of-the-art. In: Azrar L, Jalid A, Lamouri S, Siadat A, Taha Janan M, Chaari F, Haddar M (eds) Advances in Integrated Design and Production II. Springer International Publishing, Cham, pp 406–425
Google Scholar
Pan B, Zhang Z, Lim A (2021) A hybrid algorithm for time-dependent vehicle routing problem with time windows. Comput Oper Res 128:105193
MathSciNet Google Scholar
Pan B, Zhang Z, Lim A (2021) Multi-trip time-dependent vehicle routing problem with time windows. Eur J Oper Res 291:218–231
MathSciNet Google Scholar
Pan F, Ye C, Wang K, Cao J (2013) Research on the vehicle routing problem with time windows using firefly algorithm. J Comput 8:2256–2261
Google Scholar
Pellegrini P, Favaretto D, Moretti E (2007) Multiple ant colony optimization for a rich vehicle routing problem: a case study. In: Apolloni B, Howlett R, Jain L (eds) Knowledge-Based Intelligent Information and Engineering Systems. Springer, Berlin-Heidelberg, pp 627–634
Google Scholar
Pessoa A, Sadykov R, Uchoa E, Vanderbeck F (2020) A generic exact solver for vehicle routing and related problems. Math Program 183:483–523
MathSciNet Google Scholar
Queiroga E, Sadykov R, Uchoa E (2021) A popmusic matheuristic for the capacitated vehicle routing problem. Comput Oper Res 136:105475
MathSciNet Google Scholar
Quirion-Blais O, Chen L (2021) A case-based reasoning approach to solve the vehicle routing problem with time windows and drivers’ experience. Omega 102:102340
Google Scholar
RamachandranPillai R, Arock M (2021) Spiking neural firefly optimization scheme for the capacitated dynamic vehicle routing problem with time windows. Neural Comput Appl 33:409–432
Google Scholar
Rasku J, Musliu N, Kärkkäinen T (2019) On automatic algorithm configuration of vehicle routing problem solvers. J Vehicle Routing Algorithms 2:1–22
Google Scholar
Rastani S, Çatay B (2021) A large neighborhood search-based matheuristic for the load-dependent electric vehicle routing problem with time windows. Annals Oper Res
Sar K, Ghadimi P (2023) A systematic literature review of the vehicle routing problem in reverse logistics operations. Comput Ind Eng 177:109011. https://doi.org/10.1016/j.cie.2023.109011 https://www.sciencedirect.com/science/article/pii/S0360835223000359
Sicilia J, Quemada C, Royo B, Escuin D (2016) An optimization algorithm for solving the rich vehicle routing problem based on variable neighborhood search and tabu search metaheuristics. J Comput Appl Math 291:468–477
MathSciNet Google Scholar
Siglidis G, Nikolentzos G, Limnios S, Giatsidis C, Skianis K, Vazirgiannis M (2020) GraKeL: a graph kernel library in Python. J Mach Learn Res 21:1–5
Sitek P, Wikarek J, Rutczyńska-Wdowiak K, Bocewicz G, Banaszak Z (2021) Optimization of capacitated vehicle routing problem with alternative delivery, pick-up and time windows: a modified hybrid approach. Neurocomputing 423:670–678
Sluijk N, Florio AM, Kinable J, Dellaert N, Van Woensel T (2023) Two-echelon vehicle routing problems: a literature review. Eur J Oper Res 304:865–886. https://doi.org/10.1016/j.ejor.2022.02.022 https://www.sciencedirect.com/science/article/pii/S0377221722001278
Smith-Miles K, Bowly S (2015) Generating new test instances by evolving in instance space. Comput Oper Res 63:102–113
MathSciNet Google Scholar
Souza IP, Boeres MCS, Moraes REN (2023) A robust algorithm based on differential evolution with local search for the capacitated vehicle routing problem. Swarm Evol Comput 77:101245. https://doi.org/10.1016/j.swevo.2023.101245 (https://www.sciencedirect.com/science/article/pii/S2210650223000196)
Article Google Scholar
Sörensen K (2015) Metaheuristics-the metaphor exposed. Int Trans Oper Res 22:3–18
MathSciNet Google Scholar
Srivastava G, Singh A, Mallipeddi R (2021) NSGA-II with objective-specific variation operators for multiobjective vehicle routing problem with time windows. Expert Syst Appl 176:114779
Google Scholar
Susanty A, Putri VA, Purwanggono B (2022) Proposed policy to manage the barrier of the implementation of intelligent transportation system. Int J Intell Transp Syst Res 20:540–559. https://doi.org/10.1007/s13177-022-00310-8
Article Google Scholar
Szwarc K, Nowakowski P, Boryczka U (2021) An evolutionary approach to the vehicle route planning in e-waste mobile collection on demand. Soft Comput 25:6665–6680
Google Scholar
Tan F, Chai, Zy, Li, Yl, (2023) Multi-objective evolutionary algorithm for vehicle routing problem with time window under uncertainty. Evol Intel 16:493–508. https://doi.org/10.1007/s12065-021-00672-0
Tavares L, Lopes H, Lima C (2009) Construction and improvement heuristics applied to the capacitated vehicle routing problem, In: World congress on nature & biologically inspired computing, IEEE. pp 690–695. https://doi.org/10.1109/NABIC.2009.5393467
Uchoa E, Pecin D, Pessoa A, Poggi M, Vidal T, Subramanian A (2017) New benchmark instances for the capacitated vehicle routing problem. Eur J Oper Res 257:845–858
Wang F, Liao F, Li Y, Yan X, Chen X (2021) An ensemble learning based multi-objective evolutionary algorithm for the dynamic vehicle routing problem with time windows. Comput Ind Eng 154:107131
Google Scholar
Wang J, Kumbasar T (2019) Parameter optimization of interval type-2 fuzzy neural networks based on pso and bbbc methods. IEEE/CAA J Autom Sin 6:247–257
Google Scholar
Wang X, Shao S, Tang J (2021) Iterative local-search heuristic for weighted vehicle routing problem. IEEE Trans Intell Transp Syst 22:3444–3454
Google Scholar
Wang Y, Li Q, Guan X, Fan J, Xu M, Wang H (2021) Collaborative multi-depot pickup and delivery vehicle routing problem with split loads and time windows. Knowl-Based Syst 231:107412
Google Scholar
Wei G, Wu Q, Zhou M (2021) A hybrid probabilistic multiobjective evolutionary algorithm for commercial recommendation systems. IEEE Trans Comput Social Syst 8:589–598
Google Scholar
Wolfinger D (2021) A large neighborhood search for the pickup and delivery problem with time windows, split loads and transshipments. Comput Oper Res 126:105110
MathSciNet Google Scholar
Wu Y, Cai Y, Fang C (2023) Evolutionary multitasking for bidirectional adaptive codec: a case study on vehicle routing problem with time windows. Appl Soft Comput 145:110605. https://doi.org/10.1016/j.asoc.2023.110605 https://www.sciencedirect.com/science/article/pii/S1568494623006233
Xue S (2023) An adaptive ant colony algorithm for crowdsourcing multi-depot vehicle routing problem with time windows. Sustain Oper Comput 4:62–75. https://doi.org/10.1016/j.susoc.2023.02.002 https://www.sciencedirect.com/science/article/pii/S2666412723000028
Yang N, Zheng Z, Zhou M, Guo X, Qi L, Wang T (2021) A domain-guided noise-optimization-based inversion method for facial image manipulation. IEEE Trans Image Process 30:6198–6211
Yu JJQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20:3806–3817
Google Scholar
Yuan H, Zhou M, Liu Q, Abusorrah A (2020) Fine-grained resource provisioning and task scheduling for heterogeneous applications in distributed green clouds. IEEE/CAA J Autom Sin 7:1380–1393
Google Scholar
Yuan Y, Cattaruzza D, Ogier M, Semet F, Vigo D (2021) A column generation based heuristic for the generalized vehicle routing problem with time windows. Transp Res Part E: Logist Transp Rev 152:102391
Google Scholar
Zeni GA, Menzori M, Martins PS, Meira LAA (2016) VRPBench: A vehicle routing benchmark tool. arXiv:1610.05402
Zhang J, Yang F, Weng X (2018) An evolutionary scatter search particle swarm optimization algorithm for the vehicle routing problem with time windows. IEEE Access 6:63468–63485
Google Scholar
Zhang P, Shu S, Zhou M (2018) An online fault detection model and strategies based on SVM-grid in clouds. IEEE/CAA J Autom Sin 5:445–456
Google Scholar
Zhang P, Zhou M, Wang X (2020) An intelligent optimization method for optimal virtual machine allocation in cloud data centers. IEEE Trans Autom Sci Eng 17:1725–1735
Google Scholar
Zhao Z, Liu S, Zhou M, Abusorrah A (2021) Dual-objective mixed integer linear program and memetic algorithm for an industrial group scheduling problem. IEEE/CAA J Autom Sin 8:1199–1209
Google Scholar
Zunic E, Delalic S, Donko D (2020a) Adaptive multi-phase approach for solving the realistic vehicle routing problems in logistics with innovative comparison method for evaluation based on real GPS data. Transp Lett. https://doi.org/10.1080/19427867.2020.1824311
Zunic E, Donkox D, Supic H, Delalic S (2020b) Cluster-based approach for successful solving real-world vehicle routing problems, In: Proceedings of the federated conference on computer science and information systems, Springer. pp 619–626. https://doi.org/10.15439/2020F184

Download references

Funding

This work was conducted as a part of the project “Development of a vehicle fleet management system with a passenger application to limit vehicular traffic in cities” awarded to Blees Sp. z o.o. and was supported by the European Union under grants POIR.04.01.01-00-0079/18-00 (received from the Polish National Centre for Research and Development) and UDA-RPSL.01.02.00-24-00FG/19-00 (received within the Regional Operational Program for the Silesian Voivodship 2014-2020). This work was also supported by the Silesian University of Technology grant for maintaining and developing research potential. JN was supported by the Silesian University of Technology Rector’s Research and Development Grant (02/080/RGJ22/0026).

Author information

Authors and Affiliations

Silesian University of Technology, Akademicka 16, Gliwice, Poland
Tomasz Jastrzab, Michal Myller, Lukasz Tulczyjew, Michal Kawulok, Adam Czornik & Jakub Nalepa
Blees Sp. z o.o, Z. Starego 27A, Gliwice, Poland
Miroslaw Blocho

Authors

Tomasz Jastrzab
View author publications
You can also search for this author in PubMed Google Scholar
Michal Myller
View author publications
You can also search for this author in PubMed Google Scholar
Lukasz Tulczyjew
View author publications
You can also search for this author in PubMed Google Scholar
Miroslaw Blocho
View author publications
You can also search for this author in PubMed Google Scholar
Michal Kawulok
View author publications
You can also search for this author in PubMed Google Scholar
Adam Czornik
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Nalepa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Tomasz Jastrzab: Conceptualization, Software, Investigation, Methodology, Validation, Visualization, Writing—origi-nal draft, Writing—revised draft Michal Myller: Investigation, Software, Methodology, Data curation, Writing—review Lukasz Tulczyjew: Investigation, Software, Methodology, Data curation, Writing—review Miroslaw Blocho: Investigation, Methodology, Software, Data curation, Writing—review Michal Kawulok: Investigation, Formal analysis, Methodology, Writ-ing—review Adam Czornik: Formal analysis, Methodology, Writing—review Jakub Nalepa: Conceptualization, Formal analysis, Investigation, Met-hodology, Supervision, Validation, Data curation, Funding acquisition, Writing—original draft, Writing—revised

Corresponding author

Correspondence to Jakub Nalepa.

Ethics declarations

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jastrzab, T., Myller, M., Tulczyjew, L. et al. Standardized validation of vehicle routing algorithms. Appl Intell 54, 1335–1364 (2024). https://doi.org/10.1007/s10489-023-05212-0

Download citation

Accepted: 05 December 2023
Published: 05 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05212-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Standardized validation of vehicle routing algorithms

Abstract

Similar content being viewed by others

How to Generate Benchmarks for Rich Routing Problems?

A hybrid heuristic for a broad class of vehicle routing problems with heterogeneous fleet

A New Rich Vehicle Routing Problem Model and Benchmark Resource

1 Introduction

1.1 Related work

1.2 Contribution

1.3 Paper structure

2 Method

2.1 Overview of the validation approach

2.2 Benchmark generator

2.3 Quantitative, qualitative, and statistical analysis

2.3.1 Assessment of solutions: feasibility and quality

2.3.2 Assessment of algorithms: statistical and non-functional analysis

2.3.3 Assessment of solutions and algorithms: visualization and exploration

3 Experiments

3.1 Problem formulation

3.1.1 Objectives

3.1.2 Constraints

3.2 Experiment 1: end-to-end processing

3.3 Experiment 2: comparing algorithms

3.4 Experiment 3: real-life use case

4 Conclusions and future work

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation