On the stochastic vehicle routing problem with time windows, correlated travel times, and time dependency

Most state-of-the-art algorithms for the Vehicle Routing Problem, such as Branch-and-Price algorithms or meta heuristics, rely on a fast feasibility test for a given route. We devise the first approach to approximately check feasibility in the Stochastic Vehicle Routing Problem with time windows, where travel times are correlated and depend on the time of the day. Assuming jointly normally distributed travel times, we use a chance constraint approach to model feasibility, where two different application scenarios are considered, depending on whether missing a customer makes the rest of the route infeasible or not. The former case may arise, e.g., in drayage applications or in the pickup-and-delivery VRP. In addition, we present an adaptive sampling algorithm that is tailored for our setting and is much faster than standard sampling techniques. We use a case study for both scenarios, based on instances with realistic travel times, to illustrate that taking correlations and time dependencies into account significantly improves the quality of the solutions, i.e., the precision of the feasibility decision. In particular, the nonconsideration of correlations often leads to solutions containing infeasible routes.


Introduction
Vehicle Routing Problems (VRP) were and still are subject of countless studies in the literature of operational research. These problems consist in finding a set of vehicle routes serving certain customer requests with the minimum total travel cost. The term VRP in general refers to the deterministic problem, in which all the data are known and not subject to uncertainty. However, in practice, due to a variety of influences such as traffic jams, weather, customer availability etc., the real situation is often unpredictable, so that a solution of the deterministic problem in most cases produces solutions that are far from optimal in reality. To deal with this problem, specific approaches for addressing these uncertainties are needed.
The aim of this paper is to devise a new approach to check the feasibility of routes in the VRP with time windows (VRPTW) assuming stochastic travel times which can be used in branch-and-price algorithms as well as in many heuristic frameworks. Uncertainty is taken into account in several ways: firstly, the cost of a route depends on its expected travel times and is hence a random variable itself. Secondly, if a driver arrives too early at a customer's location, the resulting waiting time has to be taken into account. Our model allows to calculate different costs for waiting and driving. Thirdly, and most importantly, a route may turn out to be infeasible under certain realizations of the travel times, due to the risk of missing some time window.
Unlike many other approaches presented in the literature, we explicitly deal with dependent travel times in our approach, assuming a joint normal distribution. In practice, neighboring streets often have highly correlated travel times, so that taking into account the dependency of travel times is important in order to obtain feasible solutions. This is also confirmed by our computational study: considering such dependencies improves the precision of the solutions significantly. The importance of correlations between travel times has also been underlined in Park and Laurence (1999).
Our approach can be used in any context where single routes are considered separately. This is the case, e.g., in the well-known set partitioning model, where the variables correspond to potential routes that are either enumerated in a first phase or are generated on the fly in a column generation approach. Also many heuristic approaches produce potential routes that have to be checked individually for feasibility and therefore our proposed feasibility check can be integrated easily.
The solution method proposed in this paper is based on the chance constraint approach, where a route is accepted if the risk of a failure stays below a given threshold. We distinguish between two application scenarios. In the first scenario, the route is considered infeasible if for any of the customers on this route, the probability of missing her time window is too high. This means that the infeasibility for every customer is checked independently. Such a scenario occurs in applications that consist of only deliveries, e.g., post delivery, or only pickups, e.g., garbage disposal. We present a (single) chance constraint formulation for this scenario. In the second scenario, missing a customer's time window may render the entire route infeasible. Therefore, we aggregate the risk of failure for all customers and reject a route if the cumulated risk of failure is too high. The latter approach applies to variants of the VRP where pickups and deliveries are performed by the same vehicles, so that missing a customer might imply that some following customers cannot be served any more, e.g., due to lack of space in the vehicle. Another example for this is drayage, in which missing a customer can lead to a lack of empty containers at a later stop. In this situation, we are interested in the (joint) probability that at least one of the time windows is missed. For this setting, we present a joint chance constraint formulation. Moreover, for obtaining a more accurate stochastic model, we propose to consider truncated probability distributions in this case.
In both cases, we use and extend an idea of Ehmke et al. (2015). It addresses the fact that, due to possible waiting times, the arrival times at subsequent customers are not normally distributed any more. It is proposed in Ehmke et al. (2015) to approximate these travel times, which can be defined as maxima between two normally distributed variables, again by normally distributed travel times, with the same expected values and variances. In this context, we also refer the reader to Clark (1961), Nadarajah and Kotz (2008) and Sinha et al. (2007). We extend this approximation to the correlated case by also considering and updating the covariances between travel times. Additionally, we consider a situation where travel times change over the day, meaning that the time needed to travel an arc depends on when the arc is traversed.
It turns out that these extensions, though being computationally more expensive, lead to much more precise assessments of the feasibility of a route in realistic situations. In fact, our experiments on a set of instances with realistic travel times show that ignoring covariances leads to the creation of routes that are actually infeasible, when validated by sampling. The number of infeasible routes is decreased significantly when including covariance information in our algorithm. When travel times vary over the day, the difference between our approach taking this into account and an approach based on average travel times, is even larger, with many infeasible routes produced in the latter approach and, at the same time, objective values being better for our new algorithm.
The main purpose of our experiments is to show that considering covariances is crucial for an accurate feasibility check when instances with realistic travel times are solved. Additionally, we show that our proposed algorithms are a reliable feasibility check. As long as we do not consider time-dependent travel times (as in Section 4.2), the feasibility check could as well be performed by sampling, which would even be faster than our approach. However, we ultimately aim at combining correlations and time dependency, in which case sampling will not be a practicable approach any more; see our discussion in Sect. 4.2.2.

Literature review
Other publications using the chance constraint approach for dealing with time windows are Ehmke et al. (2015) with the assumption of independent travel times and Li et al. (2010) with independent travel and service times, others set restrictions on the probability that a vehicle's capacity is exceeded Dinh et al. (2018) and on the length of the travel time Nahum and Hadas (2009). In Li et al. (2010), travel and service times are random variables following a normal distribution. In particular, together with the chance constraint approach, they propose to formulate the stochastic VRP with time windows (SVRPTW) as a two-stage stochastic programming model with recourse. This consists in determining the route scheduling in the first stage, i.e., before the stochastic travel and service times are known, and in taking recourse actions to induce a penalty on the objective function in the second stage, i.e., once the two variables are realized. Unlike the chance constrained approach, stochastic programming models with recourse take into account the possibility of route failure and the resulting costs.
In Dinh et al. (2018), the goals are to compute lower bounds on the minimum number of vehicles required to serve a subset of customers and to present a pricing for the branch-and-cut-and-price (BCP) approach for the chance constrained VRP problem. The authors devise an improved relaxed pricing for independent normal demands and an extension to distributionally robust chance constraints. They show an interesting comparison between the chance constraint VRP (CCVRP) and the recourse models in the case of independent normal distribution. The recourse they assume is returning to the depot whenever a vehicle's capacity is exceeded. From the comparison, they conclude that "the CCVRP model tends to yield solutions that are high quality for the recourse model, whereas the reverse is not true. In addition, the CCVRP model is not dependent on a particular assumption of the recourse taken, and can be solved also when customer demands are not independent" Dinh et al. (2018). Nahum and Hadas (2009) address the Stochastic Time-Dependent VRP (STD-VRP). Their algorithm is based on the savings algorithm of Clarke and Wright (1964), designed for solving the deterministic CVRP. For transforming the stochastic and time-dependent data to deterministic data, they use the average value (average time for each time period and probability intervals), the best value (minimal time for all time periods, regardless of the probability) and the worst value (maximal time for all time periods, regardless of the probability) filters. In this way a candidate list of different deterministic estimators can be built for calculating routes, that are then analyzed by simulation. The authors of Nahum and Hadas (2009) state that "Based on our findings for stochastic time-dependent vehicle-routing problems, the results of the STDVRP are similar to the results of the saving algorithm for CVRP". Some reviews of SVRP literature can be found in Gendreau et al. (1996), in the detailed Oyola et al. (2018) and Oyola et al. (2017), and in Bastian and Rinnooy (1992) for the case of uncertain, independent and identically distributed customer demands. Articles dealing with uncertain travel times are Ehmke et al. (2015), Nahum and Hadas (2009), then Tas et al. (2013Tas et al. ( , 2014a and Van Woensel et al. (2003). Among these, in particular Nahum and Hadas (2009) and Tas et al. (2014a) study the time dependent case, in which travel times are stochastic and vary during the day. Vareias et al. (2019) deal with stochastic VRPs with uncertain travel times using chance constraints. As a subproblem, they solve the assignment of the time windows to optimality. Uncertain demands are considered in Dinh et al. (2018), Golden and Yee (1979), Guo and Mak (2004), and Novoa et al. (2006). An SVRP with simultaneous pickup and delivery under uncertain demands and travel times is dealt with by Hou and Zhou (2010). For works based on stochastic travel and service times, the reader is referred to Li et al. (2010), Miranda and Conceicao (2016), Zhang et al. (2013), and Gutierrez et al. (2016), where the latter solves the problem via a multi population memetic algorithm. For a VRP with stochastic service times only we refer to Errico et al. (2016).
Several papers assume soft time windows, namely Guo and Mak (2004), Tas et al. (2013Tas et al. ( , 2014a, and Zhang et al. (2013). Concerning the assumption of independence and dependence of the uncertain variables, Ehmke et al. (2015) assume independent travel times. Independent travel and service times are assumed in Li et al. (2010) and Miranda and Conceicao (2016). Dinh et al. (2018) consider correlated demands, while Golden and Yee (1979) assume that demands follow a Poisson distribution and show extensions with the Binomial, Negative-Binomial and Gamma distributions for solving the case of independent demand; in case of correlation they assume multivariate normally distributed demands. Rajabi-Bahaabadi et al. (2019) consider correlated travel times. For early and late arrival times at the customers, penalties are incurred. They solve the problem with a max-min ant colony system hybridized with a tabu search algorithm. Their exploratory analysis on real travel time data shows that travel times are significantly correlated and an appropriate candidate for modeling their uncertainty is the shifted log-normal distribution. Bakach et al. (2018) model a vehicle routing problem with a makespan objective incorporating both stochastic and correlated travel times.
With respect to solution methods, the majority of publications opt for heuristics and metaheuristics. Guo and Mak (2004) use a genetic based algorithm, Van Woensel et al. (2003) an Ant Colony Optimization heuristic, Dinh et al. (2018) adapt and extend the Clarke and Wright's heuristic Clarke and Wright (1964) to obtain primal solutions to the chance-constrained VRP and investigate a Dantzig-Wolfe formulation, Golden and Yee (1979) as well as Lambert et al. (1993) also use a heuristic procedure based on Clarke and Wright (1964). For the computational experiments, Ehmke et al. (2015) embed the feasibility check and the estimation of arrival and start-service times into a tabu search algorithm. Other papers which use the tabu search are Li et al. (2010), Tas et al. (2013), and Zhang et al. (2013). Recourse methods are used in Errico et al. (2016), Li et al. (2010), Novoa et al. (2006), and Zhang et al. (2013). Goel et al. (2019) propose a modified ant colony system to solve a vehicle routing problem with time windows having stochastic customer demands and stochastic service times. Particular attention has been paid to the paper of Ehmke et al. (2015), whose study can be considered at the basis of the presented paper. Summarizing, they deal with a SVRPTW assuming that the time to travel from one customer to another is normally distributed. Their approach is based on the (single) chance constraint method, which consist in accepting a route if the probability of arriving at each customer on time is greater or equal to a fixed threshold. Another assumption they make is that if a vehicle arrives at a customer before the beginning of her time window, then it must wait until the time window starts. One of the main assumptions of their study is the independence of the travel cost variables. Their interpretation of the correlation between arc travel times is that it generally decreases over time; "that is, although the travel time of two arcs may be correlated at any given time, the correlation of the travel time of one arc at the current time with the travel time of another arc at some future time (i.e. after intervening travel and service time) will be less pronounced" Ehmke et al. (2015). In their study, the time-dependency of travel times is not considered. For their computational experiments, they embedded the feasibility check used in route construction as well as the estimation of arrival and start-service times into a tabu search algorithm.

Our contribution
We present a new approach for checking the feasibility of routes of a wide class of variants of the VRPTW subject to stochastic travel times. The most important contributions of this paper are -an investigation of the importance of considering correlations between travel times by solving instances with realistic travel times; -an improved sampling algorithm for situations in which we only require a decision whether a given threshold is exceeded or not; -an approach for checking the feasibility of routes for the single chance constrained routing problem with correlations, not based on sampling; -an investigation of the importance of considering travel times varying over the day by solving instances created with data based on realistic travel times; -an approach for checking the feasibility of routes for the single chance constrained routing problem with time dependencies, not based on sampling; -an approach for checking the feasibility of routes for the joint chance constrained routing problem with and without correlations, not based on sampling; -an estimation of the waiting times of the vehicles at the customer locations that can be used in penalty based approaches.
The main contribution of the paper is the description of an algorithm for checking the feasibility of routes considering correlations and time dependencies at the same time for single chance constrained routing problems. We show that in this setting sampling is not an option and therefore to the best of our knowledge our approach is the only one that can be used.

Outline of the paper
Section 2 presents some notation for the problem we consider. In Sect. 3, we describe our idea of adaptive sampling. Section 4 deals with the single chance constrained version of the problem. In Sect. 5, an approximation of the joint chance constrained problem is discussed. The paper terminates with a summary and a discussion of possible extensions in Sect. 6. In Appendix A, some stochastic formulas used in the proposed algorithms are listed.

Preliminaries and notation
We aim at solving Vehicle Routing Problems with Time Windows subject to uncertain travel times. Extending the terminology of Toth and Vigo (2002) for the deterministic version of the problem, we will refer to this class of stochastic problems as SVRPTW. Every customer is visited exactly once by exactly one vehicle and all vehicle routes start and end at a single depot. More formally, we assume that a finite set C of nodes is given, where 0 ∈ C corresponds to the depot and the remaining nodes in C \ {0} correspond to customers to be served. Moreover, we assume that each pair of nodes i, j ∈ C is connected by a Threshold for feasibility of a route directed arc (i, j) ∈ E, we thus deal with a complete graph G = (C, E) throughout the paper. A route r in G is given by an ordered list of distinct customers c r 1 , . . . , c r k , the set of its arcs is denoted by In our problem, each customer is assigned a deterministic time window. It will be convenient to index the time windows by arcs, thus for an arc e = (i, j) we will denote by a e (b e ) the earliest (latest) start time of service at customer j, so that the time window is defined by [a e , b e ].
All travel times are uncertain and thus modeled as random variables. More precisely, we make the assumption that the vector X ∈ R E , which defines the travel time X e for each arc e, is jointly normally distributed with means μ ∈ R E + and covariance matrix ∈ R E×E , i.e., X ∼ N (μ, ); we will denote the entries of by σ e, f for e, f ∈ E. For simplicity, we assume that is positive definite.
In particular, the travel time of each single arc e is again normally distributed with X e ∼ N (μ e , σ 2 e ), where we set σ e = √ σ e,e . This also implies that the travel time of a route, given as the sum of travel times of the contained arcs, is normally distributed -however, this is only true as long as no time windows are considered. In Sect. 4.2 we additionally assume that travel times are day time dependent, e.g., at 8 a.m. we assume to have more traffic than at midnight. We model this by replacing the expected value and the variance of an arc by functions that depend on the day time. Finally, we will use ∈ (0, 1) throughout the paper to denote the threshold for the risk of a failure, i.e., we will accept a route if the probability that it is infeasible, with respect to the uncertain travel times, is at most . For the reader's convenience, we summarize this notation in Table 1.
In this paper, we restrict ourselves to discussing the following two questions with respect to a fixed route r : (a) is route r feasible with a high enough probability? (b) if so, what is the expected cost of route r ? This is motivated by the fact that many exact as well as heuristic approaches for solving vehicle routing problems can be reduced to the above tasks, in particular when the decision of (a) is a difficult problem in itself. An important example for this class of approaches is the set partitioning approach to the VRP, which is often combined with column generation. Assuming that the full set of feasible routes R is known, together with the corresponding cost c r for each r ∈ R, we can model the final optimization problem as follows: min Here we choose a cheapest subset of all routes such that each customer is visited exactly once by these routes. Throughout the paper, we use this formulation in our experiments. Since all constraints that do not depend on the travel time, such as capacity constraints, can be checked easily in a deterministic preprocessing, we do not consider these constraints in the following. In particular, vehicles with infinite capacity are assumed.
In Sects. 4 and 5, we will concentrate on the above questions (a) and (b) in two different application scenarios. In the first one, a route becomes infeasible if we miss the time window of any of the customers with probability larger than 1 − ; see Sect. 4. In the second scenario, a route is infeasible if the probability of missing at least one customer exceeds 1 − ; see Sect. 5. Even if the difference between the two scenarios seems very subtle, the second approach is much more challenging from a mathematical (and complexity-theoretic) point of view, as it requires to deal with joint chance constraints. In fact, we can only deal with the latter case in an approximate way. In both cases, the expected costs have to take possible waiting times into account.

Adaptive sampling
A standard approach for testing feasibility of a given route r would be to sample the distribution of the involved arc lengths and to check whether the given chance constraints hold with high enough probability. Clearly, the crucial question here is how many samples to compute. If the number is chosen too high, it takes too much time to compute them. If it is chosen too small, the result will be unreliable. However, in our context, we are not interested in the exact probabilities that a route is feasible or not. Instead, we only have to decide whether this probability is greater than the threshold 1 − . This can be exploited in order to adaptively choose the number of samples, using the following classical result due to Hoeffding (1963).
Here, a Bernoulli distribution is a probability distribution of an experiment with binary outcome. E.g., to be sure with probability 99% that the mean does not deviate more than 1% from the real expected value, at most 26492 samples are needed. Algorithm 1 exploits this result in order to reduce the number of samples, in situations where infeasibility can be decided early with a given probability 1 − δ.
Algorithm 1 Adaptive sampling Input: route r ; constant δ ∈ (0, 1); constant s max Output: decision if r is feasible with confidence level at least 1 − δ; cost of r 1: s tot = 0, s inf = 0 total number of (infeasible) samples computed 2: repeat 3: sample the travel times of the arcs in r and set s tot += 1 4: if route r is infeasible with the resulting travel times then 5: Let s max be the number of samples that have to be performed. Building on Theorem 1, Algorithm 1 stops earlier when the probability of infeasibility of the given route can be proven to be higher than a given threshold 1 − δ. With this approach Algorithm 1 avoids a huge number of unnecessary samples compared to the standard sampling and is able to save a huge amount of running time. For routes that are feasible, s max samples are used to guarantee that the computed cost of the route is as accurate as in the case of standard sampling.
In the following sections, we will use this approach for comparison and validation. Our main objective is to develop methods not based on sampling.

SVRPTW -single chance constraints
In this section, we assume that all demands are deliveries, deterministic and known before the optimization process, and not split. A typical application for this scenario are deliveries from a post office. An important characteristic of this kind of problem is that if a customer in a route happens not to be served for some reason, the next customer in this route can still be served by the same vehicle. This is because the service failure at one customer does not undermine the service at the next customer. Therefore, in the chance constraint approach to be developed, a risk level of service failure is fixed for each customer independently. In other words, the feasibility of a route is determined considering the union of the single chance constraints in the route. In the following, we describe our approach in mathematical and algorithmic terms and show experimental results concerning solution quality and running times. We first address the issue of correlations in Sect. 4.1, then we consider time-dependent travel times in Sect. 4.2.

Including correlations
Our first aim is to extend the approach of Ehmke et al. (2015) in order to deal with correlated travel times. As we will show in our experiments in Sect. 4.1.2, taking correlations into account leads to a much more precise assessment of the feasibility of a route.

Algorithm
Assume we are given a route r with arc set E r = {e 1 , . . . , e k , e k+1 } (in this order). As already mentioned, the chance constraint consists in placing a restriction on the probability that a given customer time window is missed. For the first customer, the constraint is easily modeled as P(X e 1 > b e 1 ) ≤ , which is equivalent to b e 1 ≥ μ e 1 + −1 (1 − )σ e 1 where denotes the cumulative distribution function of the standard normal distribution. However, when arriving before a e 1 , the driver has to wait. This may lead to other costs than driving. More importantly, the potential waiting time will influence the arrival times at the following customers in the route. In particular, the distributions of the arrival times at the subsequent customers in the route have to be re-modeled.
As already discussed by Ehmke et al. (2015), the adapted arrival times at a customer c i do not follow a normal distribution any more. Anyway, they propose to approximate the resulting distributions by normal distributions again, iteratively at every customer, and show experimentally that the resulting error is negligible. More precisely, the idea is to compute means and variances of the distributions of every arrival time and to replace the random arrival times by the corresponding normal distributions.
In this section, we follow the same idea, relaxing however the assumption of independent travel times and instead taking into account the information of the correlations between routes and arcs. This is more complicated to do because we now also have to calculate the covariances between routes and arcs. Formally, we replace the vector containing the arrival time at the current node and the travel times of the subsequent arcs in the route by a jointly normally distributed vector having the same means and covariances. Clearly, this generalization leads to a higher running time due to the quadratic input in terms of the covariances, but we think it is worth to follow this approach for having a more realistic formulation and better solutions, as we will show in detail in the following Sect. 4.1.2.
Our approach is described in Algorithm 2. After initialization in Lines 1-6, it loops through the customers of the given route r . It first updates the distribution information of the arrival time at the next customer (ex p k and var k ) as well as its covariance with all arcs f coming later in the route (cov k, f ); see Lines 8-14. Next, the route is discarded in case the chance constraint is violated (Line 15-17). Finally, the distribution information is updated once again due to a possible waiting at the current customer (Lines 18-22). All formulas needed for updating the distributions are derived in Appendix A.
Compared to the method proposed in Ehmke et al. (2015), Lines 3-5, 10-13, 20-22, and 23 are new. All but the latter concern the calculation of the correlations between Algorithm 2 Feasibility check with single chance constraint, taking correlations into account Input: route r = (0, r 1 , . . . , r t , 0) Output: decision if r is feasible with high probability; expected driving time D; expected waiting time W 1: k = 0 2: ex p k = 0, var k = 0 3: for e ∈ E r do 4: cov k,e = 0 5: end for 6: W = 0 7: for e ∈ E r \ (r t , 0) do

8:
ex p k += μ e 9: var k += σ 2 e 10: var k += 2cov k,e 11: for f ∈ E r after e do 12: cov k, f += σ e, f 13: end for 14: k += 1 25: end for 26: D = ex p k + μ (r t ,0) 27: accept route r and return D and W arcs and routes that Ehmke et al. (2015) do not consider because of the assumption of independent travel times. Line 23 calculates the expected waiting time W of the vehicle at every customer. Together with the expected driving time D, it can be used to calculate the expected cost of route r , using any function in the two values D and W .
A first comparison between the method involving dependent travel times proposed in this paper (Algorithm 2) and the method with the assumption of independent travel times of Ehmke et al. (2015) can be already made. By the introduction of covariances it is possible to estimate better the variance of the arrival time at every customer and, consequently, of the travel timeX k of the route at every step k. Without considering the covariances between arcs and routes, the variance is underestimated by Ehmke et al. (2015) in case of positive correlation (see Line 10). For < 0.5 (and particularly for small ), a smaller value of the variance leads to accepting as feasible a higher number of routes. A deeper analysis on the comparison between the solution methods is given in the following Sect. 4.1.2.

Experimental results
In order to investigate the practical relevance of the approach devised above and to illustrate the importance of taking correlations into account, we performed a case study using a set of instances with realistic travel times. The instances used for our experiments are based on real traffic data for the surroundings of the port of Duisburg. The port itself is chosen as depot and 19 nearby locations are picked as customer positions. The expected values and covariances of the travel times were calculated by a sample of data taken on 25 consecutive days at 3 pm from (www.maps.google. com). If necessary, we added a value of 10 −4 to all diagonal entries of the resulting covariance matrix in order to guarantee positive definiteness and to avoid numerical problems. Note that, if any, this has the effect of making the covariances slightly less relevant with respect to the variances.
Each arc of the graph corresponds to the shortest path between two customers or the port and a customer at the time point the data is taken. Therefore, an arc does not necessarily refer to the same path in all of the samples. This seems more realistic because a driver would always choose the shortest path at a given time, since live navigation systems can be used to avoid traffic jams. The considered network of customers and arcs is directed with an asymmetric matrix of the costs that satisfies the triangle inequality.
The arcs lengths observed in our samples ranged between 5 and 167 minutes, with an average of 60 minutes. In the resulting instance, the average correlation coefficient is 0.64 between two adjacent arcs and 0.60 between two non-adjacent arcs. The maximum correlation coefficient is 1.00 in both cases. The minimum correlation is -0.79 between two adjacent arcs and -0.81 between two non-adjacent arcs. This confirms our assumption that it is important to take the covariances into account in real life instances.
We created 10 different instances that only differ from each other in the time windows. The time windows of each instance were computed randomly in the following way: for each customer, the lower bound a e of the time window is chosen uniformly at random as an integer between 0 and 7. The length of all time windows is 1 hour. If the time windows do not allow a feasible solution for the whole VRP for any of the algorithms considered, the time windows are recomputed randomly until they do.
In the following, we compare our new Algorithm 2 to the same algorithm using zero covariances (Algorithm E), which essentially agrees with the algorithm of Ehmke et al. (2015); to our adaptive sampling algorithm of Sect. 3 with s max = 10, 000 and δ = 0.01; and to standard sampling with 10,000 samples, by solving the VRP problem on all 10 instances with each ∈ {0.01, 0.05, 0.1}; see Table 2. We then evaluate the solutions of the algorithms by sampling with 100,000 samples using the covariances, which enables us to calculate the "real" objective value and count how many of the chosen routes are actually infeasible. The objective function is D + 1 2 W , i.e., waiting is half as expensive as driving. For every sample with covariances, we calculated a vector of standard Gaussians z with dimension |E|, multiplied it with a matrix L calculated by Cholesky decomposition of , and added the vector of expected values. For solving the exact VRP we used the Set Partitioning formulation and solved it with CPLEX 12.9. All algorithms are implemented in Java version 1.8.0_212 on an Intel(R) Xeon(R) CPU E5-2680 with 2.8 GHz. Table 2 consists of five main columns. In the first column, the value of is specified. The second column describes the results for the algorithm with covariances and the third for the algorithm without covariances; the fourth and fifth column contain the results for sampling. The second to fifth column are divided into three subcolumns each: the first presents the average cpu time in seconds, the second the total number of "optimal" solutions containing at least one infeasible route, and the third the objective of the method divided by the objective of the algorithm with covariances in the cases in which both methods have produced feasible solutions. In other cases a comparison would be unfair because the algorithm with more infeasible routes clearly has an advantage in terms of the objective value.
We can see that our Algorithm 2 outperforms the algorithm of Ehmke et al. (2015) in terms of feasibility in 4 out of 30 settings: in these four instances it had less infeasible routes in the solution than the algorithm of Ehmke et al. (2015) and there was only one instance in which both algorithms were infeasible. For all other instances, both algorithms returned the same solution and therefore the objective is the same in all feasible settings. The algorithm of Ehmke et al. (2015) needs slightly less running time (between 77 and 79%), which is not surprising because it has to perform less calculations for every route. On the other hand, it computes more feasible routes and therefore the gap is not significant. We can conclude that the advantages of the algorithm considering covariances in terms of feasibility clearly outweigh the slightly higher running time. The adaptive sampling saves in average around 90% of the running time compared to the normal sampling while keeping the same accuracy. It has the same amount of infeasible solutions as our Algorithm 2 but is 2 times faster.

Including time dependency
We next address time dependency, i.e., we now allow that the traveling time of every arc e in the network varies depending on the time of the day. More precisely, we assume that the expected value and the variance of the travel time needed for an arc are functions of the point in time when the arc is entered. In practice, the traffic situation and hence the travel times strongly depend on the time of the day.
The difficulty here is that the time in which an arc is entered is itself a random variable, so that we have to deal with normally distributed random variables having expectations and variances that are implicitly defined by random variables again. An important modeling issue is how to define the dependency, i.e., which type of functions to allow. We decided to use a piecewise constant model, as it keeps the definition of instances easy and at the same time allows to efficiently update expected values and variances in our algorithm. Alternative approaches could use piecewise linear models or polynomials, splines, or even trigonometric approximations.

Algorithm
We assume to have information on travel times for a fixed set t 0 , ..., t of time points during the day. We produce two piecewise constant functions describing the distribution at time t for arc e by the expected value μ e (t) and the variance σ 2 e (t), given the values μ e (t i ) and σ 2 e (t i ) for i = 0, . . . , , as μ e (t) and analogously for σ e (t) (setting t +1 = ∞).
In our algorithm, we do not know the exact time when arc e is entered, it is given by a normal distribution. Hence, we have to consider t a random variable. Given its distribution function F t , we can obtain the expected parameters for the distribution of travel time for e as and analogously for σ 2 e (t). This formula is used in Algorithm 3, Lines 6-7. The remaining parts of Algorithm 3 are analogous to Algorithm 2 except that no correlations are taken into account.

Experimental results
Because we do not have hourly data for more than one day, we artificially extended the instances described above by using data of one day from the same streets hourly taken from 8 am to 5 pm. For every edge and for the value of every hour we computed the quotient to the value of 3 pm. These quotients were multiplied with the expected value of the edge to generate the expected value for that edge in that time. The variances remained unchanged, which is a disadvantage to our algorithm with time dependencies because in reality the variances also tend to be higher in times of the day with more traffic, and our algorithm could exploit this information while the other algorithm cannot.
Results are shown in Table 3. We compare our new Algorithm 3 with Algorithm 2 described above, however without using correlation information, i.e., with the approach proposed by Ehmke et al. (2015); with our adaptive sampling algorithm with s max = 10, 000 and δ = 0.01; and with standard sampling with 10,000 samples. The results show that in 6 out of 30 settings the algorithm without time dependencies produced infeasible routes, whereas the algorithm taking them into account always returned feasible solutions. The feasibility was again checked using sampling with 100,000 samples. Also in terms of solution value the algorithm without time dependency is less efficient and returns solutions with a value between 1 and 3 percent Algorithm 3 Feasibility Check with Single Chance Constraint, taking time dependency into account (but no correlations) Input: route r = (0, r 1 , . . . , r t , 0) Output: decision if r is feasible with high probability; expected driving time D; expected waiting time W 1: k = 0 2: ex p k = 0, var k = 0 3: W = 0 4: for e ∈ E r \ (r t , 0) do  higher. On the other hand, it uses only 12 to 14 percent of the running time compared to the new algorithm. The adaptive sampling algorithm outperforms the normal sampling by far in running time and is nearly as fast as the algorithm that ignores the time dependencies. Both sampling approaches did not produce infeasible solutions.

Combining correlations and time dependency
In the preceding sections, we have explained how to take correlations and time dependency into account separately. It is also possible to combine both in one algorithm. For this, every time we need to know a covariance σ e, f (t e , t f ) between two edges e and f , we need to compute an expected covariance of a two-dimensional piecewise constant function over two jointly normally distributed random variables t e and t f (the starting times of the two edges). As before, the starting time of an edge is just the random variable describing the length of the route up to that time. Therefore, we also need the covariances between subroutes. However, we only need to consider subroutes that start from the depot, so that the number of covariances to be computed remains quadratic in the route length. These covariances can be computed analogously to the covariance between the entire route and a single edge. All necessary data can be computed in the moment when it is required. When using piecewise constant distributions, as above, we need the joint distribution function of two normally distributed random variables to calculate the expected covariance. Computing this is numerically challenging but possible. It is also possible to sample the expected covariance with the given expected values and covariance matrix for the two starting points. This would lead to a hybrid approach using a combination of sampling and a deterministic algorithm.
We would like to emphasize that any sampling approach is impracticable when considering correlations and time dependencies. In both approaches discussed so far, the same samples could be used for all routes, and the covariance matrix was fixed. In particular, the Cholesky decomposition, needed to sample travel times in the dependent case, had to be computed only once. When considering the time-dependent case, we need the knowledge about the order of customers in the route already during the process of sampling. Therefore, we need to consider every route separately and cannot create one sample used for all routes. Even worse, since the expected covariance σ e, f (t e , t f ) depends on the starting times t e and t f now, the Cholesky decomposition cannot be computed in the preprocessing phase any more, since the matrix now depends on t e and t f . This computation is further complicated when taking waiting into account.
As a consequence, sampling is not a practical approach here, so that we cannot present a meaningful comparison with our approach. Moreover, without a samplingbased method, we cannot evaluate feasibility and quality of the solutions computed by our approach as in the last sections. For this reason, we do not present an experimental evaluation for the case of time dependent and correlated travel times.

SVRPTW with backhauls and linehauls -joint chance constraints
So far, we have assumed that feasibility of a route is determined by a high enough probability to reach each customer within its time window. An implicit assumption in our model was that the failure of serving a customer does not influence the feasibility of the rest of the route directly. We now consider the case where failing to serve a customer may directly imply the failure of serving one of the remaining customers on the route, motivated by problems where customer demands can be either deliveries or a pickups. We will refer to this problem as a Stochastic VRPTW with Backhauls (pickup) and Linehauls (deliveries), SVRPTW-BL. Note however that our problem differs from the classical VRP with Backhauls Toth and Vigo (2002), because no precedence constraints on the deliveries are assumed, and it differs from the VRP with Pickup and Delivery Toth and Vigo (2002), in that the latter assumes a delivery and a pickup for each customer.
Our investigation of this VRP variant is motivated by problems in intermodal freight transport, for example freight transportation in intermodal containers using trucks, which involves the distribution of loaded and empty containers between an intermodal facility or depot and fixed export and import customers. In these applications, a large number of different customer request types and container constraints may arise Bomboi and Pruente (2018). In particular, if a customer in a route is not served for some reason, the feasibility of the rest of the route is not guaranteed anymore, e.g., a failed pickup of an empty container leads to the failure in loading the freight at the next customer or a failed delivery leads to a lack of space in the truck for a subsequent pickup.
We thus have to guarantee a service level for the whole route and not for each single customer, that is, we need to consider joint chance constraints in the place of single chance constraints. In other words, we have to deal with the risk of infeasibility of the whole route rather that considering the risk of failing to serve any of the customers in the route independently. For simplicity, we assume here that failing to serve any customer within its time window makes the entire route infeasible. In the remainder of this section, an algorithm for this case, using joint chance constraints in an approximate way, is provided and computational results are presented. In principle, by a combination of our proposed algorithms for the single and the joint chance constraint case, our approach could be extended to the case where not all failures render the rest of the route infeasible. Besides having to deal with joint chance constraints now, another complication is that, after passing a customer, distributions have to be truncated, as we are only interested in travel times under the assumption that the customers visited so far have been served in time.

Algorithm
We now present an algorithm to address the SVRPTW-BL problem described above; see Algorithm 4. It takes correlations into account, but does not deal with time dependency. With respect to Algorithm 2 , two major changes arise. Firstly, instead of checking a chance constraint for every customer, we have to check one chance constraint for the whole route. Unfortunately, dealing with joint chance constraints is hard, and there is no compact formula modeling the joint risk, therefore it is often approximated in the literature by calculating the risk of every single event (here a failure of one customer), summing it up (using total in Algorithm 4), and comparing it to the maximum allowed risk for the joint chance constraint . We follow this approach; the corresponding changes in Algorithm 4 concern Lines 19-22.
When Algorithm 4 ends with a feasible route, i.e. when total ≤ , the value total represents the probability of the route failure. This information could also be used in a bicriteria-style approach: instead of discarding all routes with total > , one could sort the routes by their risk and solve the optimization problem for different risk levels without having to recompute the feasible routes. E.g., in a column generation approach, increasing would then just lead to the addition of more columns.
The second change concerns the truncation discussed above: for calculating the expected value and variance for later arrival times in the route, only the scenarios being feasible so far should be considered. E.g., if a driver misses the time window of the first customer and thus cannot serve it, he cannot continue to the second customer, and therefore the arrival time in this scenario does not influence the arrival time at the second customer, and so on. To model this, we use a one sided truncated normal distribution of the upper tail (usingμ andσ to denote expectations and covariances after truncation). After checking the chance constraint, the truncation for the expected value and for the variance is performed in Line 23 and 24. As we consider joint normal distributions, the truncation of one variable also affects the covariances between two other variables, so the whole submatrix of corresponding to the current route has to be updated; see Lines 25-28. Note that the truncated distributions are again replaced by normal distributions. The formulas used for computing the truncated distributions can be found in Appendix A.

Experimental results
In our experimental evaluation, we create instances in exactly the same way as described in Sect. 4.1.2, except that different sets of time windows may now be discarded due to infeasibility. Table 4 shows the results of a comparison between Algorithm 4 considering correlations and truncations; the variant of it not considering correlations (Algorithm 5); another variant not performing truncations but considering correlations (Algorithm 6); our adaptive sampling algorithm with s max = 10, 000 and δ = 0.01; and normal sampling with 10,000 samples. The algorithm considering correlations and truncations returned in only one of thirty settings an infeasible route. Ignoring the truncations did not lead to more infeasible settings, but decreased the running times by around 85 percent. Ignoring the correlations, on the other hand, led to five times more infeasible settings. In fact, 5 out of 30 settings resulted in an infeasible route in this approach. At the same time, it did not even run faster than the algorithm ignoring truncations because it allows more feasible routes, which outweighs the additional running time per route. Again adaptive sampling is much faster than the normal sampling by keeping the same accuracy. Both sampling methods made two mistakes, which is double as much as Algorithm 4. As a conclusion, the recommended algorithm for solving instances of SVRPTW-BL considers correlations but ignores truncations.

Conclusion
We devised several algorithms for checking the feasibility of a given route in the Stochastic Vehicle Routing Problem with time windows and with correlated and timedependent travel times. They are either based on closed formulas or sampling. We consider either single or joint chance constraints depending on whether missing a customer's time window makes the entire route infeasible or not. These algorithms can be embedded into most algorithms for solving the Vehicle Routing Problem with hard time windows and waiting times as a feasibility check. Due to the fact that the algorithms also compute expected waiting times, they can be easily adapted to variants of the problem with soft time windows and penalties. The experimental results obtained in our case study show that the algorithms assess the feasibility of a given route with a reasonable precision, in particular when correlations are taken into account. However, for a large number of samples, the latter approach is time consuming.
Considering the complementary strengths of the deterministic and the sampling approach, one could also investigate combinations of both approaches. E.g., one could check for a given route how likely it is to become infeasible, and if the result is very close to the value of acceptance, do a second check performing sampling with a high number of samples -possibly only for routes that were accepted, but very close to be not accepted, because adding an infeasible route is in general more problematic than missing a feasible one. Another hybrid approach would be to use sampling with a high number of samples just as a callback inside the optimization process: first compute an optimal solution using our approach, then check the feasibility of all routes used in this solution by sampling, and if some route turns out to be infeasible, remove it and re-optimize. The advantage of this approach is that the sampling has to be performed for a very small set of routes and can thus be performed with a large number of samples in reasonable time.
Funding Open Access funding enabled and organized by Projekt DEAL.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Computation of updated moments
For the convenience of the reader, we explain in the following how to compute the expected values, variances and covariances of the random variables appearing in Algorithms 2 and 4. For this, we use the assumptions and notation of Sect. 2, in particular, we assume X ∼ N (μ, ) and thus X i ∼ N (μ i , σ 2 i ). For indices i 1 , . . . , i k , we consider the joint density function of X i 1 , . . . , X i k , given by whereμ denotes the vector with entries μ i 1 , . . . , μ i k and¯ the corresponding submatrix of . We again assume here that and hence¯ is positive definite.

Rectified Gaussian distributions
We first consider the random variable max{c, X i } for a constant c. For the expected value, we obtain To compute the variance V ar(max{c, X i }) = E(max{c, X i } 2 ) − E(max{c, X i }) 2 (A.2) we can use (A.1) and Finally, we have ∞ c ts f X i ,X j (s, t) ds dt = cμ j P(X i ≤ c) + (μ i μ j + σ i j )P(X i ≥ c) + μ j σ ii f X i (c) and thus obtain Cov(X i , max{c, X j }) = E(X i · max{c, X j }) − E(X i )E(max{c, X j }) = σ i j P(X j ≥ c) (A.3)

Truncated Gaussian distributions
We next develop formulas for the moments of the random vector X truncated by a condition X i < b, where b is again a constant. For the expected values, we have .4) and, in particular, For the covariances, we have which can be computed using (A.4) and As a special case, we obtain a formula for which also applies to the case i = j.