Dynamic vehicle routing with time windows in theory and practice

The vehicle routing problem is a classical combinatorial optimization problem. This work is about a variant of the vehicle routing problem with dynamically changing orders and time windows. In real-world applications often the demands change during operation time. New orders occur and others are canceled. In this case new schedules need to be generated on-the-fly. Online optimization algorithms for dynamical vehicle routing address this problem but so far they do not consider time windows. Moreover, to match the scenarios found in real-world problems adaptations of benchmarks are required. In this paper, a practical problem is modeled based on the procedure of daily routing of a delivery company. New orders by customers are introduced dynamically during the working day and need to be integrated into the schedule. A multiple ant colony algorithm combined with powerful local search procedures is proposed to solve the dynamic vehicle routing problem with time windows. The performance is tested on a new benchmark based on simulations of a working day. The problems are taken from Solomon’s benchmarks but a certain percentage of the orders are only revealed to the algorithm during operation time. Different versions of the MACS algorithm are tested and a high performing variant is identified. Finally, the algorithm is tested in situ: In a field study, the algorithm schedules a fleet of cars for a surveillance company. We compare the performance of the algorithm to that of the procedure used by the company and we summarize insights gained from the implementation of the real-world study. The results show that the multiple ant colony algorithm can get a much better solution on the academic benchmark problem and also can be integrated in a real-world environment.


Introduction
The vehicle routing problem (VRP) is a combinatorial optimization problem which has been studied for a long time in the literatures, such as Bianchi et al. (2009), Marinakis et al. (2010), Xiao et al. (2012), Pillac et al. (2013) and Yang et al. (2015). The aim of this problem is to deliver orders from depot to customers using a fleet of vehicles. Here we look at a practically important variant of this problem where new events (demands, orders) are dynamically introduced during operation time and cars have to serve customers at times within given time windows. So far the problems of dynamical events and time windows have only been looked at in isolation, but in this paper we will propose and analyze an algorithm that can deal with dynamicity and time windows. Since the VRP problem already in its most basic variant is NP hard it seems unlikely that efficient exact solvers for larger instances can be built and one has to rely on heuristics and meta-heuristics for finding good solutions. Among these heuristic methods, problem specific heuristics, including savings heuristic, local search meta-heuristics, and approaches from natural computing such as ant colony optimization are common approaches for solving this problem. Yet, the most powerful solvers today combine several of these methods and could be termed hybrid solvers.
In this article a hybrid solver is developed. In the global search architecture it uses an ant colony optimization system, whereas in its initialization and search operators it uses problem specific construction and local search methods. More specifically, the multi ant colony system (MACS) is introduced to solve the real-world dynamic vehicle routing problem. MACS was first proposed by Gambardella et al. (1999) which used two ant colonies to search the best solution for the vehicle routing problem in order to improve the performance of ant colonies. In this algorithm, the first colony minimizes the number of vehicles while the second one minimizes the travel cost. van Veen et al. (2013) generate a dynamic vehicle routing problem with time windows (DVRPTW) benchmark based on the static Solomon benchmark and adjust the MACS to this dynamic problem. This article extends upon this conference paper by providing a more in-depth discussion and motivation of the approach and benchmark designs. More importantly, we add results from a real-world pilot study provided by a Dutch mobile surveillance company. This paper is organized as follows: The problem is formally described in Sect. 2. Related work is summarized in Sect. 3. Section 4 describes the MACS algorithm and how it is adapted to the dynamical vehicle routing problem with time windows. Section 5 introduces a benchmark for this problem class and describes the performance of the algorithm on the benchmark and also includes results on static benchmarks for validation. The real-world study, set up in Rotterdam, is described in Sect. 6 and we summarize the experiences gained from the case study. Section 7 reviews the main results of this article. Finally, Sect. 8 summarizes the work of this article and suggests directions for relevant future research.
2 Problem description 2.1 Static vehicle routing problem The classical VRP formulation was first defined by Dantzig and Ramser (1959). In classical VRP, a fleet of vehicles seek to visit all orders of the customers at minimum travel cost. This problem is an NP-hard problem and the well known traveling salesman problem (TSP) is a special case. Next, we will look at the capacitated VRP (CVRP), where each vehicle has a maximal capacity. It can be modeled by introducing a weighted digraph G ¼ ðV; AÞ, where V ¼ fv 0 ; v 1 ; . . .; v N g is a vertex set representing the customers and A ¼ fðv i ; v j Þ; i 6 ¼ jg is an arc set, where ðv i ; v j Þ represents the path from customer i to customer j. Vertex v 0 represents the depot which has M vehicles, and vertices (v 1 ; . . .; v M ) denote the customers that need to be served. Each vehicle has a maximal capacity Q and each customer v i is associated with a demand q i of goods to be delivered (the demand q 0 ¼ 0 is associated to the depot v 0 ), a time window ½e i ; l i from the earliest starting time to the latest starting time for the service, and the duration (time) of a service s i . Each arc ðv i ; v j Þ has a non-negative value weight representing its traveling cost c ij . There are N customers and M vehicles. The goal is to minimize the traveling cost.
Formally, the CVRP can be defined as a mathematical programming problem with binary decision variables (cf. Christofides et al. 1981;Cordeau et al. 2001). Let n ijk ¼ 1, if vehicle k visits customer x j immediately after visiting customer x i , and n ijk ¼ 0 otherwise. Now, the mathematical programming problem reads: subject to Here, the constraints of the formulation can be explained as the constraints of VRPs. In detail the constraint equations above are motivated as follows.
Eq. 2a: Each customer must be visited exactly once.
Eq. 2b: If a vehicle visits a customer, it must also depart from it. Eq. 2c: The total quantity in each vehicle is less or equal to the maximal capacity Q. Eq. 2d: The total traveling time of each vehicle is less or equal to a given time T. Eq. 2e: Each vehicle must be used exactly once.
In this work we are going to consider the vehicle routing problem with time windows in which to serve the customers (CVRPTW). Additional constraints are needed for modeling time windows. In this case the start serving time t i to vertex v i is between the time windows ½e i ; l i .

Dynamic vehicle routing problem
In the real world, most of the delivery problems are dynamic vehicle routing problems. Psaraftis (1995) pointed out the difference between static VRPs and dynamic VRPs. In the static VRPs, the information of the orders is known in advance. While in dynamic problems, some of the orders are given initially and an initial schedule is generated. But new orders are dynamically received when the vehicles have started executing the routes and the route has to be rearranged in order to serve these new orders. The challenge is whether the algorithm can give a high quality solution quickly when the new event happens.
To be able to solve a dynamic problem we first have to simulate a form of dynamicity. Kilby et al. (1998) have described a method to do this, which is also used by Montemanni et al. (2005). They proposed to partition the working day into time slices and solve problems incrementally. The notion of a working day of T wd seconds is introduced, which will be simulated by the algorithm. Not all nodes are available to the algorithm at the beginning. A subset of all nodes are given an available time at which they will become available. This percentage determines the degree of dynamicity of the problem. At the beginning of the day a tentative tour is created with a-priori available nodes. The working day is divided into n ts time slices of length t ts :¼ T wd =n ts . At each time slice the solution is updated. This allows us to split up the dynamic problem into n ts static problems, which can be solved consecutively. The goal in DVRPTW is similar to that of static VRPs, except that some customers and their time windows are unknown a-priori and parts of the solutions might already have been committed.
In our approach the previous solution and the pheromone distribution of the ant colony optimization algorithm is used as initialization to the optimization in a time slice, because we expect the new solution not to be entirely different from the previous one. A different approach would be to restart the algorithm from scratch every time a node becomes available. However, this strategy is too time consuming for algorithms used in real time operation and on typical hardware used by logistics service providers.

Related work
In general VRP and VRPTW are NP hard problems and they generalize the NP-complete traveling salesman problem. Therefore heuristic algorithms are widely used in order to solve the vehicle routing problem. Classical examples are the nearest neighbor heuristic by Flood (1956) and the savings algorithm that was developed by Clarke and Wright (1964) based on the savings concept which repeatedly combines two customers on the same route. Early advances were achieved by Shaw (1998) using large neighborhood search.
Nowadays, the use of meta-heuristics becomes more and more popular. Semet and Taillard (1993) presented a tabu search for finding a good solution for the vehicle routing problem. Baker and Ayechew (2003) combined the genetic algorithm and neighborhood search methods which can give a reasonable results for this problem. Gambardella et al. (1999) introduced ant colony optimization which can use artificial ant colonies to construct a shortest route.
In contrast to a large multitude of available static VRP solvers, there are only a few algorithms which can tackle dynamic VRPs. In principle, most of the algorithms described above can be adapted to solve the dynamic VRPs. But in order to deal efficiently with the dynamics of this problem, the algorithm should also have some mechanisms that promote reusing learned features of the problem from previous solutions. As indicated in Eyckelhof and Snoek (2002), some bio-mimetic ant-colony optimization algorithm seems to support dynamic adaptations of delivery routes well. For instance, in ant colony optimization virtual pheromone trails are created to indicate good directions if solutions only need to be changed partially.
Ant colony optimization (ACO) is a meta-heuristic algorithm based on the natural behavior of the ant colony which was proposed by Dorigo (1992) in his Ph.D. thesis. More recently, it has been employed in a number of combinatorial optimization problems, such as scheduling problems in Xiao et al. (2013), Chen and Zhang (2013), routing problems in Balaprakash et al. (2009), Toth andVigo (2014), assignment problems in Stützle (2010), D'Acierno et al. (2012), set problems in Ren et al. (2010), Jovanovic and Tuba (2013) and so on. Moreover, ACO can be easily combined with local search heuristics and route construction algorithms. The flexibility of ACO and its good performance in static vehicle routing problem make it an attractive paradigm for the dynamic vehicle routing problem. Dynamic vehicle routing with time windows in theory and practice 121 Ant-based methods were first proposed with the ant system method in Colorni et al. (1991). These methods simulate a population of ants which use pheromones to communicate with each other and collectively are able to solve complex path-finding problems-a phenomenon called stigmergy. For the VRPTW problem, an ant-based method was proposed by Gambardella et al. (1999). They showed that good results can be achieved by running one ant colony for optimizing the number of vehicles and one ant colony for minimizing route cost and term their method multi ant colony system (MACS). The paradigm of ant algorithms fits well to dynamic problems in Guntsch and Middendorf (2002) including TSP in Eyckelhof and Snoek (2002) and special types of VRP problem, where vehicles do not have to return to the depot which can be seen in Montemanni et al. (2005). In our article we will extend multi ant colony optimization to problems with time windows and we will call our new method MACS-DVRPTW.
There exist some previous studies on using meta-heuristics other than ant colony algorithms on DVRPTW. Gendreau et al. (1999) propose to use tabu search, but, as opposed to standard benchmarks for MACS-VRPTW, developed their approach for problems with soft time windows.

Algorithm
In order to solve this problem, it is natural to extend the stateof-the-art ant algorithm for VRPTW to the dynamical case. To our best knowledge, the multi-colony approach described in Gambardella et al. (1999) is the best ant algorithm for the VRPTW with a description that allows to reproduce results, and it shows a good performance on standard benchmark problems by Solomon. Here we will directly describe our new dynamic version of this algorithm and indicate changes.
The central part of the algorithm is the controller. It reads the benchmark data, initializes data structures, builds an initial solution and starts the ACS-TIME colony and ACS-VEI colony. The ACS-TIME colony tries to minimize traveling cost given a fixed number of vehicles, the ACS-VEI colony seeks to minimize the number of vehicles. Priority of the algorithm is on reducing the number of vehicles. Given solutions with the same number of vehicles, those solutions are preferred that use less time. The ACS-VEI colony restarts the ACS-TIME colony whenever a solution is found that can serve the demand with a smaller number of vehicles.
The nearest neighbor heuristic in Flood (1956) is used to find initial solutions of vehicle routing problems. But for the VRPs with time windows, it is difficult to get a feasible solution by using this method. So it has to be adjusted in two ways. First the constraints on time windows have to be checked to make sure no infeasible tours are created. Besides, a limit on the number of vehicles is passed to the function. Therefore, a more appropriate algorithm is needed to generate the initial solution. Because of these limitations, it is not always possible to return a tour that incorporates all nodes. In that case a tour with less nodes is returned.
The new initial Ranking Time Windows Based Nearest Neighbor algorithm is proposed to generate the initial solution for the DVRPTW. By adding the sorted earliest arrival time of the orders to exact n v tours one by one, this algorithm can take the time windows and vehicles number constrains in advance. This way there is a higher chance to get a feasible solution with better fitness value. Algorithm 1 describes the initialization. It proceeds as follows: Firstly, the list of customers is sorted by increasing values of earliest arrive times. Then, n v tours are created, each of which corresponds to one vehicle. For each customer node find the tour with smallest distance among all those tours in which the node can be inserted without violating constraints. Following this procedure, the nodes are iteratively added in the node list. Finally, the resulting solution is returned.
Algorithm 1 Initial algorithm 1: Let L denote the set of n customers. Sort them by increasing values of earliest arrive times e i . If the nodes have the same e i , arrange them by increasing values of the latest arrive times l i . 2: Let T denote the list of tours, where nv is the length of the list. Initially, each tour in T has only a single node which is the vehicle at the depot. 3: i ← 0 4: while i is smaller than n do 5: T abuList ← ∅;

6:
while node i is not added to a tour do 7: Calculate the distances d ij between l i and node t j , 9: where t j denotes the last node of tour j. 10: Find the index (= minIndex) of the tour that has the shortest distance to l i : 11: minIndex := arg min j∈{1,...,nv }\T abuList {distance(l i , t j )}.

12:
if node i can be added to tour minIndex then 13: Add node i to the end of tour minIndex. 14:

return T
After initialization, a timer is started that keeps track of t, the used CPU time in seconds. Then the algorithm will run on line during the working day which ends at some point in time denoted with T wd . Let T Ã denote the currently optimal solution. Then, at the start of each time slice the controller checks if any new customer nodes became available during the last time slice. If so, these new nodes are inserted using the InsertMissingNodes method, in order to update T Ã . Thereafter, some of the nodes are changed to the status committed. The position of committed nodes in the tour cannot be changed anymore. If v i is the last committed node of a vehicle in the tentative solution, v j is the next node and t ij is travel time from node v i to node v j , then v j is committed if e j À t ij \t þ t ts . When the necessary commitments have been made the two ant colony systems (ACS) are started. If a new time slice starts, the colonies are stopped and the controller repeats its loop.
The pseudo-code of the controller can be seen in Algorithm 2. ACS contains two colonies, each one of which tries to improve on a different objective of the problem. The ACS-VEI colony searches for a solution that uses less vehicles than T Ã . The ACS-TIME colony searches for a solution with a smaller traveling cost than the cost in T Ã while using at most as many vehicles as the best solution so far, i.e. T Ã . A solution with less vehicles has a higher priority than a solution with a smaller distance. Once a feasible solution is found by ACS-VEI, the controller restarts.

Algorithm 2 Controller
1: Set time t = 0; Set available nodes n 2: T * ← NearestNeighbor(n); τ 0 ← 1/(n · length of T * ); 3: Start measuring CPU time t 4: Start ACS-TIME(vehicles in T * ) in new thread 5: Start ACS-VEI(vehicles in T * − 1) in new thread 6: repeat 7: while Colonies are active and time step is not over do 8: Wait until a solution T is found 9: if Vehicles in T < vehicles in T * then 10: Stop threads 11:

12:
if time-step is over then 13: if new nodes are available or new part of T * will be defined then 14: Stop threads 15: Update available nodes n 16: Insert new nodes into T * 17: Commit necessary nodes in T * 18: if colonies have been stopped then 19: Start ACS-TIME(vehicles in T * ) in new thread 20: Start ACS-VEI(vehicles in T * − 1) in new thread 21: until t ≥ T wd 22: return T * There are a few differences between the two colonies. ACS-VEI keeps track of the best solution found by the colony (T VEI ), which does not necessarily incorporate all nodes. As T VEI also contributes to the pheromone trails it helps ACS-VEI to find a solution that covers all nodes with less vehicles. ACS-VEI does not use local search methods. In contrast, ACS-TIME does not work with infeasible solutions and it performs a local search method called Cross Exchange in Taillard et al. (1997) which is shown in Fig. 1.
A constraint on the maximum number of vehicles that can be used is given as an argument to each colony. During the construction of a tour this number may not be exceeded. This may lead to infeasible solutions that do not incorporate all nodes. If a solution is not feasible it can never be send to the controller. Both colonies work on separate pheromone matrices and send their best solutions to the controller. Pseudo-codes for ACS-VEI and ACS-TIME can be found in Algorithm 3 and 4, respectively.

Algorithm 3 ACS-VEI(n v )
1: Input: nv is the maximum number of vehicles to be used 2: Given: τ 0 is the initial pheromone level 3: 4: Initialize pheromones to τ 0 5: Initialize IN i to 0 for i = 1, . . . , N 6: Comment: Here IN i is a counter for how many times 7: the customer node i has not been added to the solution. 8: 9: T VEI ← NearestNeighbor(nv) 10: repeat 11: for all ants k do 12: Algorithm 5 describes the construction of a tour by means of artificial ants. A tour starts at a randomly chosen depot copy. When constructing a new tour, the committed Algorithm 4 ACS-TIME(v) 1: Input: nv is the maximum number of vehicles to be used 2: Given: τ 0 is the initial pheromone level 3: 4: Initialize pheromones to τ 0 5: 6: repeat 7: for all ants k do 8: T k ← ConstructTour(k, 0) 9: Local pheromone update on edges of T k using Equation 4 10: if T k is a feasible tour then 12: T k ← LocalSearch(k) 13: 14: Find feasible ant l with smallest tour length 15: if length of T l < length of T * then 16: return T * to controller 18: 19: Global pheromone update with T * and Equation 5 20: until controller sends stop signal Dynamic vehicle routing with time windows in theory and practice 123 parts of T Ã which cannot be changed any more have to be incorporated first. Then the tour is iteratively extended with available neighborhood nodes. There are many ways to define the topology structure of neighborhood nodes. In the paper, the neighborhood nodes are defined as all the available nodes that have not been committed and visited yet. The neighborhood nodes set N k i contains all available nodes which have not been committed and visited for ant k situated at node i. Inaccessible nodes due to capacity or time window constraints are excluded from N k i . In order to decide which node to chose, the probabilistic transition rules by Dorigo and Gambardella (1997) are applied. For ant k positioned at node v i , the probability p k j ðv i Þ of choosing v j as its next node is given by the following transition rule: with s ij being the pheromone level on edge (i, j), g ij the heuristic desirability of edge (i, j), a the influence of s on the probabilistic value, b the influence of g on the probabilistic value, N k i the set of nodes that can be visited by ant k positioned at node v i , and s ij ; g ij ; a; b ! 0. Moreover q denotes a random number between 0 and 1 and q 0 2 ½0; 1 a threshold.
The part below is taken from Dorigo and Gambardella (1997) Pick node j using Equation 3 24: if j is a depot copy then 28: current time k ← 0 29: load k ← 0 30: x ← x + 1 31: for all committed nodes v i of the x th vehicle of T * do 32: During the ConstructTour process of ACS-VEI, the IN array is used to give greater priority to nodes that are not included in previously generated tours. The array counts the successive number of times that node v j was not incorporated in constructed solutions. This count is then used to increase the attractiveness g ij . The IN array is only available to ACS-VEI and is reset when the colony is restarted or when it finds a solution that improves T VEI . ACS-TIME does not use the IN array, which is equal to setting all values in the array to zero.
The local pheromone update rule from Dorigo and Gambardella (1997) is used to decrease pheromone levels on edges that are traversed by ants and it will be briefly described next. Each time an ant has traversed an edge (i, j), it applies Eq. (4).
By decreasing pheromones on edges that are already traveled on, there is a bigger chance that other ants will use different edges. This increases exploration and should avoid too early stagnation of the search. The global pheromone update rule is given in Eq. (5). To increase exploitation, pheromones are only evaporated and deposited on edges that belong to the best solution found so far and Ds ij is multiplied by the pheromone decay parameter q.
where T Ã is the best tour found so far and L Ã is the length of T Ã . Gambardella et al. (1999) has shown that the MACS is very efficient in solving static vehicle routing problems with time windows. Here we are going to test and benchmark the extended algorithm for dynamic vehicle routing problems with time windows.

Benchmark on simulated data
The Solomon benchmark is a classical benmark for static VRP in Solomon (1987). It provides 6 categories of scalable VRPTW problems: C1, C2, R1, R2, RC1 and RC2. The C stands for problems with clustered nodes, the R problems have randomly placed nodes and RC problems have both. In problems of type 1, only a few nodes can be serviced by a single vehicle. But in problems of type 2, many nodes can be serviced by the same vehicle.
In order to make this a dynamic problem set we apply a method proposed by Gendreau et al. (1999) for a VRP problem, to the more comprehensive benchmark by Solomon on VRPTW. A certain percentage of nodes is only revealed during the working day. A dynamicity of X% means that each node has a probability of X% to get a nonzero available time. The available time means the time when the order is revealed. It is generated on the interval ½0; e i , where e i ¼ minðe i ; t iÀ1 Þ. Here, t iÀ1 is the departure time from v i 's predecessor in the best known solution. These best solutions are taken from the results of a static MACS-VRPTW implementation (see Table 1)-for the detailed schedules we refer to the support material available on http://natcomp.liacs.nl/index.php?page=code. By generating available times on this interval, optimal solution can still be attained, enabling comparisons with MACS-VRPTW. Table 2 shows the average results and standard deviation change with the dynamicity levels.
The implementation was executed ten runs on a Intel Core i5, 3.2 GHz CPU with 4 GB of RAM memory. The controller stops after 100 s of CPU time. The following default parameters are set according to the literature: m ¼ 10, a ¼ 1, b ¼ 1, q 0 ¼ 0:9, q ¼ 0:1 (cf. Gambardella et al. 1999), T wd ¼ 100 s, and n ts ¼ 50 (cf. Montemanni et al. 2005).
To the best of our knowledge, there is no other algorithms which have been implemented to solve this problem. In this paper, four variants of the algorithm are generated in order to improve the performance of the algorithm. Four variants of the algorithms were as follows: (1) default settings as described above, (2) spending 20 CPU seconds before the starting of the working day to construct an improved initial solution (IIS), (3) with pheromone Table 1 Comparison of results reported for the original MACS-VRPTW in Gambardella et al. (1999) (4) min-max pheromone update in Stützle and Hoos (1997). For MMAS, we set q ¼ 0:8. The values used are: These are updated every time a new improvement of T Ã is found.
Average results for IIS and MMAS are almost identical to the original results. The reason for this seems to be that although the initial solution is greatly improved, it is more difficult to insert new nodes into the current best solution. Tables 3 and 4 show results for different types of problems in more detail. WPP improves distance results for 10 % dynamicity and MMAS for 50 % dynamicity, both for the price of slightly more vehicles. Another finding is that for 10 % dynamicity solution quality declines by up to 20 % and for 50 % by up to 50 %.
From a practical approach it can be stated that for a small dynamicity of 10 % at most 1 additional vehicle is needed as compared to scheduling the same amount of static orders, and in many cases the same number of vehicles suffice. For 50 % dynamicity the number of vehicles increases almost always by one vehicle and can in some cases even increase by two vehicles.

Case study
This section will explain the details of the case study. First the test case which was used for the pilots will be discussed. Then the initially implemented algorithm is described. Finally, the execution of real-world pilots will be discussed, including the intermediate revisions of the algorithms that were motivated by problems encountered in real-world testing.

Test case
To show that the method can be successfully applied in practice, a field study (with real drivers and vehicles) was  The bold font is for the best for each problem conducted. The pilot study was carried out with the Dutch security company Trigion (http://trigion.nl) on a scenario that resembles a typical working day in mobile surveillance. Every day this security company has between 300 and 400 planned jobs in the Rotterdam area. These planned jobs include surveillance, security checks, and the opening or closing of buildings, among others. There are strict contracts about the time windows and tasks which are included in such a job. Also, the average service time for each job is known. The deviation, along with a typical minimum and maximum service time is also well-known. These numbers are all derived from historical data. There is an average of about 45 incidents (or alarms) per day within the same region. However, this amount can vary from 30 to 110 incidents. These incidents can for instance be fire alarms, burglary alarms or technical problems. They appear during the day and cannot be predicted. Some predictions can be made, i.e. most alarms occur in the evening and on industrial terrains, but their exact times and other properties are not known beforehand. Therefore, this business case is perfect for implementing a DVRPTW. This DVRPTW has an average dynamicity of 11.6 %.
To use the business case as a practical real-world testing case for a DVRPTW algorithm, the case needed to be scaled down. For 400 incidents a few dozens of vehicles would be needed. A pilot of this size would be outside of our scope, because of finances, time and complexity. Therefore, a test case of five vehicles was created with four vehicles for static jobs from the same depot and the same day. All the jobs have addresses close to each other. This resembles the problem for a smaller area with a single depot. These 4 vehicles had to cover a total workload of 48 jobs. Also, one incident vehicle from the same area and day was selected, covering nine incidents. This gives us a dynamicity of 15.8 %, ð9=ð48 þ 9ÞÞ which is relatively high compared to the average of 11.6 % in the real-world business case. This was done on purpose to make a challenging test case. The 57 orders were made anonymous by selecting an address up to two streets away from the initial address. Due to the small perturbation radius this still makes a realistic test case. The time windows of the jobs within the test case all took place within a 6 h time-frame, in the evening. To get a general view of the addresses in the test case, the map with all customers is shown in Fig. 2. A characteristic of this problem is that the concentration of orders is concentrated higher in two central parts than in peripheral parts of the urban agglomeration.
In the pilot study each customer (or job) i has the following properties: • A location. This is an address. The travel time, cost or distance d ij between two jobs i and j can be calculated by a navigation (web)service, such as Google Maps. • A service time s i . The time it takes to complete the job.
The service time is not always known a-priori. Sometimes a job takes unexpectedly long or short (e.g. when a burglary alarm turns out to be a false alarm). • A time window ½e i ; l i . The security company is contractually obliged to visit within this time frame. The bold font is for the best for each problem Dynamic vehicle routing with time windows in theory and practice 127 Most time windows have an interval of multiple hours, some less than an hour. An incident time window is either 30 or 45 min. • A priority p, ranging from 1 to 4. 1 and 2 for incidents, 3 and 4 for static jobs, 1 being the highest priority, e.g. a fire alarm. Some customers have more expensive fees for tardiness and thus have a higher priority. • An availability time or occurrence time. All static jobs are available at t ¼ 0. Incidents will become available during the day. The availability time of an incident is equal to its time window start time e i , because incidents can always be visited as soon as they become available, in contrast to static jobs.
The jobs which are known a-priori will be referred to as static jobs. Static jobs have an average service time of 25 min, ranging from 1 min for a short check to 8 h for a surveillance. The dynamically assigned jobs are referred to as incidents. Incidents have an average service time of 16 and a half minute, but their total range is from only a few seconds (false alarm) up to multiple hours in case of a burglary arrest. However, usually an incident takes 10-30 min. Locations are usually clustered in business areas.

Gaps and adaption
At the moment there is almost no dynamicity implemented in the baseline algorithm used in the business case. All jobs which are known a-priori, the static jobs, are scheduled by a state-of-the-art static VRPTW algorithm. The exact algorithm is unknown to us, as it is confidential. Also, a number of vehicles is always on stand-by. Their job is solely to react to any incoming incidents. Incidents are assigned by a (human) coordinator. In most cases an incident will go to the closest stand-by vehicle. In very rare cases, an incident will be picked up by a static job vehicle. The coordinator might need to do some manual rescheduling in this case.
This approach has some disadvantages: 1. The response to incidents might be too late if all incident vehicles are busy at the same time. 2. It takes time for the coordinator to plan all the incidents. Especially when multiple incidents come in at once and routes need to be rescheduled. 3. On a quiet day (a day with less than average incidents), the incident vehicles will be idle most of the time. This results in unnecessary labor time and bored employees.
Possible advantages of such an approach are:  In order to test the MACS algorithm, trail 1 is implemented to find the gaps between the theory benchmark problem and the real-world problem. The conclusions drawn from the first pilot were used to improve the implementation of the algorithm. A list was made of each required improvement and these were implemented iteratively. The most important revisions were: 1. Balancing of the vehicles. During the pilot some vehicles were very busy, while others had hardly any work (i.e. 25 and 2 jobs respectively). This can be seen in the results section, (Sect. 7) where Fig. 3b shows a vehicle with a significantly high amount of orders during the entire pilot. This fact resulted in the busy vehicles being late. Balancing also helps to give some buffer time, in case an incident has to be handled. Balancing was achieved by giving the vehicles a maximum amount of orders during initialization in the nearest neighbor algorithm. This maximum was chosen as n=ðn v À 1Þ, where n v is the maximum of vehicles can be used in the pilot. 2. When a driver is already performing a job or driving towards a job, he/she should not be interrupted. I.e. this job should not be reassigned to another driver. 3. At the moment of recalculating the routes, it is important to keep track of the current time and the current position of the vehicles to check if any vehicles will be late. It might be necessary to reschedule in order to prevent tardiness. 4. The vehicle speed used in planning was assumed too high initially, since most of the pilot took place in an urban area. It was reduced to 30 km/h.
Also the controller was changed to be adjusted to the real-world situation. The controller of the implemented algorithm is displayed in Algorithm 6. The adjustments to this controller are: 1. The algorithm is not constantly searching for better routes. This is because the amount of changes to driver schedules should be minimized to avoid confusing the drivers. The cost of a small change would possibly be greater than its gain.The algorithm is not actively calculating after updating the schedules and before a new incident is introduced. 2. The number of iterations used by the ant colonies was set to 5000. This number was found to produce acceptable results within a minute. A short total calculation time was necessary to update routes as quickly as possible after an incident occurred. This number might need to be changed when the test case is scaled up or down. 3. The first job of a vehicle will always be locked on the first position of its route. This is so the driver never loses a job he/she is already performing. Also, when a driver started driving towards a customer, this customer should not be rescheduled to another driver.

Algorithm 6
The controller of the final implementation of the MACS-DVRPTW algorithm.
1: Set time t = 0 2: T * ← NearestNeighbor 3: while not terminate initial calculation do 4: Start ACS-TIME with nv = nv of T * 5: Start ACS-VEI with nv = nv of T * − 1 6: Wait until a solution T is found 7: if If nv of T < nv of T * then 8: Stop colonies 9: T * ← T 10: Stop colonies 11: Update routes 12: Start execution of problem solution 13: while execution of DVRPTW is not over do 14: Wait for new incident 15: Lock current task of each vehicle 16: for each missing node do 17: Calculate cost of each possible insertion in each route in T * 18: Insert node where cost is lowest 19: Get current time and vehicle locations 20: if routes are feasible then 21: return T * as default solution and broadcast update to drivers 22: else 23: Start ACS-VEI with nv = nv of T * 24: Wait until a feasible solution T * is found 25: return T * as default solution and broadcast update to drivers 26: Stop colonies 27: Start ACS-TIME with nv = nv of T * 28: Wait until MaxTime is reached 29: if T * is much better than the default solution then 30: return T * and broadcast update to drivers 31: Stop colonies 32: Update routes Other important adjustments to the algorithm were: 1. High priority is given to returning as fast as possible a feasible solution. This is why directly after finishing the direct insertion method already a solution can be returned to the controller; If there is no feasible solution available ACS-VEI is used first, as it searches with priority for feasible solutions. 2. ACS-TIME is used to find improvements of feasible solutions after having found a default feasible solution.
Only if it succeeds to find a much better solution (a threshold is used here) this new solution will be returned and broadcast as an update to the drivers. 3. If the colony is trying to add missing nodes to an infeasible route, the highest priorities will be added first, if possible. The missing nodes are sorted by priority.
4. Feasibility of a route is based on the current location of the vehicles, which can be viewed as starting positions or depots when introducing an incident. Feasibility is also based on the time at the moment of calculation. Therefore, past time windows will not be considered anymore. By considering time and vehicle locations, more accurate schedules can be made when introducing a new incident while vehicles are driving towards a job. The feasibility check is based on the time and location which are retrieved. 5. Driving speed is by default 30 km/h, which is a good average speed for urban areas, allowing for some buffer time. Also in many areas the max speed is 30 km/h by law. 6. The nearest neighbor heuristic intends to distribute the jobs relatively even across the vehicles. This will give a balanced initial solution for the ACO pheromone initialization. Recall that, this is achieved by giving each vehicle a maximum of n=ðn v À 1Þ jobs.

Pilot experiments
Next, the practical details of the experiments and the observations that were made will be discussed. To successfully implement a DVRP it is crucial to know the location of the vehicles and their status at the moment of occurrence of a new job. To achieve this, the DEAL platform which can be seen in Mahr and de Weerdt (2005) was used. This platform is made for managing workflows in logistics. All drivers can use a mobile application to update their status and GPS locations. The DEAL mobile application also shows to the drivers and the coordinators the sequence of jobs and their locations. The ACO algorithm was implemented as an external algorithm agent which was able to get an overview of the available jobs and the available vehicles. When this algorithm agent was triggered, it used ACO to rearrange the routes of the vehicles. To test how well the algorithm performed in practice, two teams with five drivers each were hired. Team A worked according to the solution of the baseline algorithm provided by the security company. For this team four cars were assigned to static orders in a predetermined schedule, while one car visited all the incidents. It was used as a control group for baseline comparison. While Team B tested the performance of the MACS algorithm. All five cars were assigned to the static orders. When a new incident occurred, it would be assigned to one of these running cars based on the algorithm. In order to get a fair comparison between teams, both teams got their jobs assigned to them through the DEAL mobile application. However, Team A's incident driver got a text message each time he or she was assigned to the new incident as common practice for the security company. Team B's drivers were instructed to be aware of changing routes at all times. Each time an incident became available, the agent was triggered to change Team B's routes. This was done on-the-fly. Both team started by the time that would enable them to reach their first address on time, according to the security company's planning. Team B's vehicles all were available for incidents from the time that they started.
The second pilot experiment consisted of only five drivers, referred to as Team C. This pilot became necessary because of shortcomings in the new scheduling method that needed to be corrected. For reasons of cost and practical feasibility another control group was not included. The first control group results proved very consistent and there was no strong need to test these results again, since the situation was expected to be very similar. Both pilots were conducted on a Friday, during the same time period, with no large weather differences. However, a small bias was introduced by an unexpected traffic jam that occurred during the second pilot. Much like Team B of Pilot 1, the five cars of Team C were sent out to visit their dynamic routes, which were determined on-the-fly by the (improved) algorithm agent. This time, there was a bigger focus on the minimization of labor hours, therefore not all cars started at the beginning of the pilot. Two cars started driving at the start of the pilot. Three other cars were given a customized starting time, based on the start of the time window of their first planned job.
As mentioned above, during Pilot 2, a traffic jam occurred which made some orders late and some orders failed. Because another pilot was not affordable, we decided to make a virtual Team D to do a simulation pilot (Pilot S) based on the data obtained in Pilot 2.

Results
This section contains and discusses the results of all conducted pilots and of the simulated Team D. First of all, the performance of the teams will be discussed. After that, the survey of the drivers' experience will be summarized. Finally, the lessons learned on bridging theory and practice will be summarized in order to help other researchers to implement their algorithm in the real world.

Performance assessment
All the data during the pilots was stored which gave us a good insight into the real-world timing of the algorithm. For MACS, to perform well on the business case, it is important that there are as little contract violations as possible. Therefore, it is important to look at the timeliness of drivers, since they could arrive too late. It is also possible that a job is not visited at all, either because the driver was running too late or because the algorithm saw this as infeasible. In a very rare occasion (twice) the job was started before the time window, this is (in our case) due to human error.
The static jobs for Team A (Control Group in Pilot 1), Team B (Pilot 1), Team C (Pilot 2) and Team D (Simulation Group in Pilot S)are shown in Table 5. And in Table 6 the incident results can be seen. These results show us that the control group performed relatively well and stable. No control group driver arrived too late for either a static event nor for an incident. The route which was executed by the control group was based on the planning of the security company. The company executed this route many times before the pilot ran.
The first algorithm pilot experienced some problems. The most important problems are mentioned in Sect. 6.2, since they were used to improve the implementation before starting Pilot 2. The problems in Pilot 1 caused a significant amount of jobs to fail or at least be late. This can be seen in both Tables 5 and 6. More than one third of the jobs were not finished in Pilot 1. This is not acceptable for the business case. An important cause of this tardiness was the fact that one vehicle was scheduled to have more jobs than it could handle. Figure 3b shows that vehicle 2 was given much more orders than the other vehicles. This problem remained during the entire pilot, even though vehicle 3 was already finished with its jobs by the time the fifth incident occurred. This vehicle could have taken on some of the excess jobs from vehicle 2, but it didn't.
After making the improvements of Sect. 6.2, Pilot 2 was conducted. A great improvement compared to Pilot 1 was observed. In Fig. 3c we can see that the jobs are more evenly distributed between vehicles and that these total amounts have a downward slope as time progresses.
Partly because of this even distribution, the timeliness of Pilot 2 was a lot more acceptable. Only 2 (static) jobs remained unvisited. Five jobs were too late with a total late time of 50 min. However, halfway through the pilot, one of the drivers got stuck in the traffic jam which was not present during the control group pilot. Two jobs were located in the middle of this traffic jam, both with an arrival time relatively close to the planned arrival times of the control group (within the same hour). So it is safe to say that the control group could also have experienced some delay. Or at the very least we could say that the Pilot 2 driver would have experienced less or no delay if the traffic jam would have not been present.
In the Pilot S, there is no traffic jam any more. The results showed that all the jobs were visited and there were no late nor early jobs. With this, we have more evidence that the algorithm can succeed in practice, under normal circumstances.
For the real-world case, the most important metric is the total labor time. These results are presented in Table 7. The total labor time needed would be the accumulated driving times of all cars, including driving from and towards the depot. The total driving times without driving times to and from the depots are also shown. This provides an impression of the on-line performance, excluding the influence of the starting and finalization strategy. The total time of Team B and Team C seem to be the shortest, but this is because jobs were left unfinished. For Team D we see an reduction in total labor time of 5 % compared to the control group.

Drivers experience survey
During this pilot the drivers took some forms with them so that they could take notes about their jobs, including arrival times and stress-levels. This was done to gather insights into the human factor of the implementation. The most important outcomes of the survey of Pilot 1 were: 1. The changing of routes was experienced as 'confusing' by some drivers. 2. A driver felt it was pointless that he had to drive back and forth from one side of the city to another side and back again. The experience of the driver was negative because he did not know the global solution. 3. Most stress was experienced by drivers that were running late. 4. Most drivers said they felt more confident about the execution of their tasks because they got a clear briefing beforehand and because they could contact a coordinator at all times. 5. Most drivers felt the planning was tight, but not too tight or stressful.
Outcomes 1 and 2 were only relevant for the drivers that tested the dynamic ACO algorithm (Pilot 1). From the survey of Pilot 2, also the outcomes 3 and 5 were found. Furthermore, the following results came out of the survey: 6. Two drivers found that a more frequent refresh of the job list would be helpful. A forced refresh each time a route is changed might even be more effective. 7. One driver experienced quite some stress during a traffic jam. 8. Four drivers already participated in the first trial, and experienced the second went much smoother. This was accounted mostly towards the relative absence of problems, such as disappearing jobs. The drivers of Pilot 2 were given a form to write down their arrival times and also their stress, confidence, or certainty level. Ranges are from 1 to 5, were 1 is '(almost) none' and 5 is 'a lot'. Stress and confidence level where evaluated when arriving at a job.
At most times (42/55) stress was 1 (very low) and confidence was 5 (very high). When stress went up, that   usually meant that the driver's confidence was low. (7/12) The drivers experienced stress in the following occasions: • The driver was running late.
• The driver got stuck in traffic.
• The driver took a wrong turn, delaying his route.
• The driver was not sure if finishing a job outside of the time window also counted as being late.
The first and the second situations can (partly) be reduced in their number by making smart algorithms and adding data on traffic situation. For avoiding the third situation, training of the drivers and inclusion of buffer time could be beneficial. The last situation can be easily avoided by a better briefing of the drivers.

From theory to practice: lessons learned
Implementing in practice means testing in practice. When working with real-world cases and data, one cannot simply implement something and only test on academic benchmarks. Some general lessons on bringing routing algorithms from theory to practice have been learned and we condensed them to three key principles: • Iteration works It is impossible to know all the functionality of the algorithm implementation and situations that might occur in practice beforehand. Therefore it is important to keep in mind that requirements might change. A real-world test will give a clearer look on the elements needed. It is however still a good idea to get a head-start on the requirements by doing simulated benchmarks. Starting with a thorough analysis of the business case can also give a good indication of what particularities require attention. In our first pilot, we could have avoided some mistakes by better analyzing the effect of clustering on the job distribution. Handling of various kinds of constraints is often specific to the realworld scenario and algorithms will only succeed if they are flexible enough for adaptation. • Communication is key Implementing an algorithm in a real-world environment is not a one man's job. In our case we needed at least an optimization algorithm expert, a logistics systems/workflow manager (DEAL), a logistics company providing a business case, and a team of drivers. These experts had to be able to communicate with each other. Social aspects of the project as well as business aspects needed to be addressed, besides technical aspects. While confidentiality issues needed to be respected, at the same time it was to be made sure that enough insights were gained from the pilot in order to improve algorithmic methods. • People are important The customers and drivers should play an important role in the development of the end results. After all, they will be using it and if they don't understand the algorithm's instructions they may even start to ignore them or complain. We found that a clear briefing and description of tasks and expectations contributed to the confidence of the drivers. Changing of routes comes at a psychological cost, as the driver was already primed (mentally prepared) for another task. Therefore, the changing of routes should be presented as transparent as possible so the employee comprehends the logic of his route sufficiently, i.e. does not doubt the efficiency of the schedule. It is also important to consider that an employee needs to feel useful and needs to have the feeling that he/she is treated fair.

Summary and outlook
This work proposed a dynamic algorithm for VRPTW that allows to integrate new orders during operation in a schedule.
A new algorithm, MACS-DVRPTW, was introduced and described. It is an extension of the state-of-the-art ant colony based meta-heuristic MACS-VRPTW for dynamic VRPTW problems. A dynamic benchmark is created based on the static Solomon's benchmark for VRPTW, by revealing some of the orders only during operation time to the algorithm. Statistical studies were conducted, showing that MACS-DVRPTW algorithm performs better than the state of the art algorithms on the academic benchmarks. In the pilot experiments adaptations were needed in order to achieve competitive performance. The new version of the algorithm performs better than the solution by the company in terms of total driving time, but it requires still improvement in terms of real-world constraint handling for special situations such as traffic jams. And it will also be interesting to compare this algorithm with other dynamic methods such as Wang et al. (2010), Lung and Dumitrescu (2010). Another major finding was that the human factor is important. In order to account for this in the development phase, three main principles have crystallized out that we summarize as: iteration works, communication is key, and people are important.
In future work these principles need to be more fully used. Besides optimization also the interaction between drivers and software seems to play a major role. Here techniques from transaction management could prove to be useful, e.g. to design a protocol that makes it possible to deal with sudden changes of the situation such as traffic jams and makes regular checks on the feasibility of the current plan based feedback on the drivers location. A full integration of the available information from GPS tracking will however require major adaptation to the design of scheduling algorithm and it will therefore be left for future work.