1 Introduction

Small drones, also called unmanned aerial vehicles (UAVs), have successfully made their way to civil applications. They fly routes in an autonomous manner, carry cameras for aerial photography, and transport goods from one place to another. The range of applications is broad, including aerial monitoring of plants and agricultural fields as well as support for first time responders in case of disasters (Andre et al. 2014; Kovacina et al. 2002; Lima et al. 2014; Quaritsch et al. 2008). Delivering goods with a fleet of drones (Grippa 2016; Grippa et al. 2017) becomes an option if classical means of transportation—such as trucks, trains, and planes—are inappropriate. This comes about if roads, railway tracks, or landing facilities do not exist, if weather conditions make it impossible to use them, or if their use is too dangerous or time consuming. A compelling service in this context is the delivery of medicine, vaccinations, or laboratory samples for patients in remote areas and crisis regions. Such a service is also worthwhile in densely populated metropolitan areas when congestion makes roads nearly impassable.

This article investigates drone-based delivery at a system level. The entities of the system are goods, customers, vehicles, and depots. Customers request goods that are stored in depots and delivered by vehicles. Service requests, also denoted as jobs or customer requests, are not known in advance and arrive over time at certain locations according to a space-time stochastic process. We introduce an approach to dimension the system (How many vehicles and depots are needed for a certain area?) and propose and analyze policies for job assignment (How to assign customer requests to vehicles to minimize the expected delivery time?).

If the vehicles are capable of consecutively serving several customers before they return to a depot, e.g., if goods are lightweight and total distances are small, the problem of selecting customer requests to be served falls into the domain of dynamic vehicle routing with stochastic demands, dating back to Bertsimas and Ryzin (1991, 1993a, b). In contrast to this, vehicles in our system serve no more than one customer per trip for capacity reasons, and we use the term job assignment to emphasize the difference to routing. Focusing on depot-to-customer delivery, we complement research on routing policies for wide-area surveillance (Bullo et al. 2011; Enright et al. 2009; Frazzoli and Bullo 2004; Pavone et al. 2011; Savla et al. 2008).

We model the system as an M / G / K queue with \({K>1}\). Our performance measure is delivery time, i.e., the time it takes for a customer from requesting to obtaining a good. The stability of this service is linked to the queuing of jobs, i.e., customers may have to wait until other customers have been served. The system becomes unstable if the average number of waiting customers persistently increases over time.

Our contributions are as follows: First, we give a lower bound for the expenditure needed to set up a stable system as a function of the targeted average delivery time. Second, we analyze two simple job assignment policies: Nearest job first to random vehicle (NJR) and first job first to nearest vehicle (FJN). These policies are suboptimal but their performance yields insight on the system dynamics which helps us to design advanced policies. Third, we analyze a policy known from queuing theory, named first job to vehicle with the smallest workload (\({\text {FJW}\pi }\)) in this paper, and modify it according to the lessons learned from NJR and FJN. We propose the first job first to vehicle with smallest additional workload (\({\text {FJW}\delta }\)) policy. \({\text {FJW}\pi }\) is not optimal in light load and stabilizes the system only for a fraction of the possible arrival rates. The novel \({\text {FJW}\delta }\) is optimal in light load and stabilizes the system for almost all arrival rates. Fourth, we show that FJN and FJW exhibit a tipping point behavior: One vehicle makes the difference between almost optimal performance and instability. Therefore, careful dimensioning is required. In contrast to this, \({\text {FJW}\delta }\) can stabilize the system for almost all arrival rates, as long as the number of vehicles per depot is sufficient. This finding yields a connection between dimensioning and control of the system. Fifth, this last finding enables proper dimensioning of the delivery system controlled with \({\text {FJW}\delta }\). In particular, the infrastructure of depots is subject of a long-term decision, the number of vehicles can be modified in short-term. The setup of the system (number of depots and vehicles) shapes its financial costs and, in conjunction with the job assignment policy, the service quality provided to the customers for which they are willing to pay. Hence, which minimum expenditure is required to provide a certain service quality is an interesting question with respect to investing in such a delivery system. The lower bound mentioned in the first point provides the answer. We illustrate the application of this service-possibility-frontier with parameters reported by the company Matternet. Parts of this article related to the first, the second, and the fifth points are published in Grippa et al. (2017). The bound in the first point has been improved. All results and discussions related to FJW policies are novel.

2 Related work

2.1 Types of vehicle routing problems (VRPs)

The framework of the used model dates back to Dantzig and Ramser (1959), who introduced their formulation of the vehicle routing problem (VRP) as a generalization of the traveling salesman problem (Flood 1956). Since then, the operations research community has intensively studied how a central planner determines optimal sets of routes for fleets of homogeneous vehicles, supplying given sets of geographically dispersed customers with goods (Golden and Assad 1998). In the context of a “classical VRP”, such an optimal set of routes accomplishes that (i) all customers are supplied with the requested products, (ii) none of the vehicles exceeds its capacity traveling along its route, (iii) no customer is visited more than once, (iv) all routes start and end at one central depot, and (v) the overall routing cost is minimized. In practical applications, VRPs have a broad diversity of additional requirements and operational constraints affecting the construction of the optimal set of routes. Among these are periodic VRPs (Angelelli and Speranza 2002), VRP with pickup and delivery (Desaulniers et al. 2002), VRP with split deliveries (Dror et al. 1994), and VRP with time windows to serve customers within (Cordeau et al. 2002). For reviews of exact and approximate methods for solving the classical VRP, we refer to Baldacci et al. (2007), Cordeau et al. (2007), Laporte (1992, 2007, 2009), Toth and Vigo (2002a, b) and, for an exhaustive bibliography on vehicle routing, to Laporte and Osman (1995).

VRPs are classified according to the nature of system input information. If all system input is known before the vehicles leave the depot(s) and does not change during mission execution, the matter of concern is described in the paragraph above, and said to be both static and deterministic. For many real-world applications at least some input information like customer arrivals behaves according to a probability distribution rather than is known a priori. These VRPs are denoted as stochastic. If some input information appears or changes during mission execution and has to be integrated immediately into decision-making, the VRP is called dynamic (Psaraftis 1998, 1995, or, for a recent review, Pillac et al. 2013). Then, designing sets of routes has to be replaced by designing routing policies which describe the evolution of motion paths as a function of newly arriving input.

2.2 Stochastic and dynamic vehicle routing in robotics and aeronautics

Bertsimas and Ryzin (1991, 1993a, b) were the first to comprehensively analyze a stochastic and dynamic VRP. The problem is also named dynamic traveling repair-person problem (DTRP): Requests are distributed according to a space-temporal stochastic process. Every request is associated with one location. Vehicles travel from request to request and spend time in each location providing on-site service. This problem received considerable attention for applications in robotics and aeronautics. Particular attention was devoted to the motion coordination of mobile robots, which includes spatially-distributed surveillance policies for UAVs that are adaptive to network changes (Bullo et al. 2011; Enright et al. 2009; Frazzoli and Bullo 2004; Pavone et al. 2011; Savla et al. 2008). In some of these policies, named partitioning policies, the service area is partitioned in sub-areas, one for each vehicle, and every vehicle serves only the requests in its sub-area according to certain rules (Pavone et al. 2011). Strategies were developed to ensure that a certain fraction of requests is served before the jobs expire (Pavone et al. 2009), that account for service priorities (Smith et al. 2010) and translating requests (Bopardikar et al. 2010), and that accomplish an effective system management without explicit communication (Arsie et al. 2009).

Considerably less work was dedicated to another vehicle routing problem called dynamic pickup and delivery problem (DPDP) (Swihart and Papastavrou 1999; Waisanen et al. 2008). In this problem, every request is associated with two locations (pickup and delivery) and, at each service, a vehicle transports a good from pickup to delivery location. This problem is different from the DTRP because the service of the request requires the vehicle to change location. Both pickup and delivery locations are drawn according to a continuous spatial distribution, i.e., goods are not stored in fixed depots.

In our problem, goods are available in depots with fixed locations. This problem can be seen either as a DTRP with a non-euclidean distance, or as a DPDP with pickup locations chosen from the depot locations. For these reasons, the approaches developed in related work are not directly applicable to this problem. Furthermore, we explicitly take battery charging into account.

3 System model

3.1 Entities of a delivery system

The system is composed of \({K\in \mathbb {N}}\) vehicles moving in a bounded and convex service area \({\mathcal {A}\subset \mathbb {R}^2}\) of size \(A{:}{=}\Vert \mathcal {A}\Vert \), where \(\Vert \cdot \Vert \) is the Euclidean norm. A vehicle is denoted by \(v_{k}\) with identifier \({k\in \{1,\ldots ,K\}}\). The current position of \(v_{k}\) at time t is \(\mathbf{v _{k}(t)\in \mathcal {A}}\) with \(t \ge 0\). All vehicles travel at the same constant velocity \({\nu \in \mathbb {R}^+}\) and are equipped with a battery, whose charging level at time t is represented by \({b_k(t) \in [\,0,1]}\). The fact that batteries have to be recharged or exchanged is quantified by the parameter \(\alpha \in (0,1]\), which is the air-time ratio, i.e., \({\alpha = \text {air time}/(\text {air time + charge time})}\).

The arrival of delivery requests for goods within \(\mathcal {A}\) is generated by a Poisson process with finite intensity \({\lambda \in \mathbb {R}^+ }\), where \(\lambda \) is the arrival rate. The requests, also called jobs, are indexed by the job identifier \({n \in \mathbb {N}}\), which indicates the order of request arrivals. The corresponding customer is called \(c_{n}\); his or her position is denoted by \(\mathbf c _n \in \mathcal {A}\) and assumed to be independently and uniformly distributed in \(\mathcal {A}\). The information about the waiting customers is centrally stored. In distributed policies (NJR), vehicles periodically access the information and select the jobs. In centralized policies (FJN and FJW), a central unit periodically asks the vehicles for information and assigns the jobs to them. We assume that communication happens just  before selection/assignment. However, in FJW policies, communication and assignment can be postponed without any effect on the policy outcome (see Sect. 6.1). Customer requests do not only differ with respect to timing and locations but also with respect to the goods requested to be delivered.

Goods are, in general, different but have the same expiration date and are treated with identical priority. The system consists of \({L\in \mathbb {N}}\) depots to store goods. The depots are interconnected and provide a sufficient number of all goods and service activities, like recharging batteries. We assume that, for capacity reasons, a vehicle cannot serve more than one customer request per trip (see Bertsimas and Ryzin 1993a, p. 71). The depots are set up at locations \(\mathbf{d =[\mathbf d _1\;\mathbf d _2\;\ldots .\;\mathbf d _L] \in \mathcal {A}^L}\), where \(\mathbf d \) is chosen such that the expected distances between a random point \(\mathbf{q \in \mathcal {A}}\) (potential request) generated according to a uniform distribution over \(\mathcal {A}\) and the closest depot are minimal:

$$\begin{aligned} H_{L}(\mathbf d ,\mathcal {A})\,{:}{=}\, \frac{1}{A}\cdot \int _{\mathcal {A}}\min _{l:l\in \{1,\ldots ,L\}}\Vert \mathbf d _l-\mathbf q \Vert d\mathbf q . \end{aligned}$$

This corresponds to the solution of the continuous multimedian problem known from geometric optimization (Papadimitriou 1981; Zemel 1984):

$$\begin{aligned}&\mathbf d ^* = \arg \min _\mathbf{d \in \mathcal {A}^L} H_{L}(\mathbf d ,\mathcal {A}), \end{aligned}$$
$$\begin{aligned}&\text{ with } \quad H^*_{L}(\mathcal {A}) {:}{=}H_{L}(\mathbf d ^*,\mathcal {A}). \end{aligned}$$

A depot is a storage but not a permanent home base for particular vehicles. Whenever a vehicle delivered a good, it either approaches the nearest depot or one that is more suitable to handle the next customer request. Suitability is determined by the job assignment policy.

3.2 Service operations and delivery time

The delivery time for customer \(c_{n}\) (\(n \in \mathbb {N}\)) is denoted by the stochastic variable \({T_n = W_n + R_{n} + S_n}\), where \(W_{n}\) is the waiting time, \(S_{n}\) is the service time, and \(R_{n}\) is the return time. Figure 1 illustrates all operations involved in the service of \(c_{n}\) from arrival at \(t=\tau _0\) to service completion at \(t=\tau _{10}\). The waiting time is \({W_{n}=\tau _5-\tau _0}\). The return time is \({R_{n}=\tau _7-\tau _5}\) and depends on the position of the customer \(c_{n^-}\) having been served before \(c_n\) and on the system’s load. \(R_{n}\) is included in \([0,R'_{n}]\): It is null if the vehicle is ready at the depot, and maximum if the service of \(c_{n^-}\) is not completed at the arrival of \(c_n\). The service time is \({S_{n}=\tau _{10}-\tau _7}\).

In the context of “classic” queuing theory, the term “service time” indicates the total time to process one request \(B_n\), which corresponds to \({S_n + R_n}\) in this paper. Bertsimas and Ryzin (1991, 1993a, b) use service time to indicate the time spent on-site at the request location which corresponds to the term unloading time in this paper.

We neglect the times to load and unload goods. As introduced previously, the service provided to customers is the transport of goods from a depot to the customers. In practical applications it is reasonable to assume that loading and unloading times are negligible compared to the time needed for travel. These times could be modeled but besides slightly different numerical values they would not change the qualitative results of our analysis.

Fig. 1
figure 1

Time intervals involved in a customer service: delivery time T, waiting time W, return time R, and service time S

3.3 Queuing phenomena and stability

A job assignment policy \(\Pi \) has to restrain the outstanding jobs (Bertsimas and Ryzin 1991, 1993a, b; Bullo et al. 2011; Enright et al. 2009; Frazzoli and Bullo 2004; Pavone et al. 2011; Savla et al. 2008). Policy \({\Pi }\) is referred to as stabilizing if the expected number of pending jobs (customers waiting for service) stays confined over time, i.e., if there exists an arbitrary constant \(\kappa <\infty \) such that

$$\begin{aligned} \bar{N}_{\Pi }\,{:}{=}\, \lim _{t \rightarrow \infty } \mathbb {E}[N (t)|\Pi ]\le \kappa , \end{aligned}$$

where N(t) denotes the number of pending jobs at time t. We assume that \({N(0)=0}\), i.e., no customer is waiting at \(t=0\).

The return and service times are crucial for the stability of the system. A necessary condition for stability is (Bertsimas and Ryzin 1993a, p. 63):

$$\begin{aligned} \frac{\bar{D}}{\nu } \le \frac{K}{\lambda } \end{aligned}$$

if the on-site service time is null. \(\bar{D}\) is the average Euclidean distance between two customers served in sequence, \(\lambda \) is the arrival intensity, and \(\nu \) is the vehicle speed. This condition applies to our problem with two changes: first, the distance becomes  the distance  function  customer-depot-customer, which, divided by the speed, gives \(\bar{R'}+\bar{S}\) (Fig. 1), where \(\bar{R'}\) is the average return time in high load and steady state, and \(\bar{S}\) is the average service time in steady state. Second, the effective number of used vehicles is on average \(\alpha K\), where \({\alpha }\) is the airtime ratio. Therefore, our stability condition is

$$\begin{aligned} \bar{R'}+\bar{S} \le \frac{\alpha K}{\lambda } \;. \end{aligned}$$

The problem analyzed in this paper can be modeled as an M / G / K queue with interdependent service times \(B_n\). M indicates Poisson distributed customer arrivals, G indicates service times distributed according to a generic distribution, and K is the number of servers. For M / G / K queues with independent service times (Kleinrock 1975), the load factor is defined as

$$\begin{aligned} \rho \, {:}{=}\, \frac{\lambda \bar{B}}{\alpha K} \;. \end{aligned}$$

The system is said to be in low load condition if \(\rho \rightarrow 0\) and in high load condition if \(\rho \rightarrow 1\). A necessary condition for the stability of the system is \({\rho <1}\). In case of stability, \(\rho \) can be interpreted as the expected value of the fraction of busy servers. This definition does not apply to our case because \(\bar{B}\) depends on the system state (Kleinrock 1975), which includes the number of waiting customers. Nevertheless, the condition of stability is still valid: \(\bar{B}\) approaches \(\bar{R'}+\bar{S}\) for \(\rho \rightarrow 1\), leading to (6). In words, to stabilize the system, it is necessary that the average time between two successfully completed service requests is not larger than the average time between two customer arrivals multiplied by the average number of available vehicles.

4 Expenditure for minimum infrastructure

We start to derive the minimum expenditure for infrastructure necessary to build a stable system as a function of system performance by deriving a lower bound on average delivery time \(\bar{T}\). The average delivery time (with respect to the request location) is minimum if a vehicle is ready at the depot nearest to the request location. In this case the return time is null. Thus, the minimum average delivery time is equal to the minimum average service time: \({\bar{T}\ge \bar{T}_\text {min} = \bar{S}_\text {min}}\). The latter can be expressed in terms of the \(L\)-median function (3) as

$$\begin{aligned} \bar{S}_\text {min} = \frac{H^*_{L}(\mathcal {A})}{\nu }\;. \end{aligned}$$

Since the \(L\)-median function decreases with \(L\), it is necessary to have a minimum number of depots \(L\) to be able to achieve the targeted performance \(\bar{T}\). The minimum \(L\) as function of \(\bar{T}\) is non-increasing piecewise constant, and defined as

$$\begin{aligned} L(\bar{T}) = \min \left\{ l\in \mathbb {N} : \frac{H^*_{l}(\mathcal {A})}{\nu } \le \bar{T}\right\} \;. \end{aligned}$$

In high load, the average time to process a job \(\bar{B}\) is bounded according to \({\bar{B} \ge 2 \bar{S}_\text {min}}\). Using the latter in (7) together with \({\rho <1}\), we derive a condition for the minimum number of vehicles necessary to have a stable system for a given configuration of depots:

$$\begin{aligned} K> \frac{\lambda 2 \bar{S}_\mathrm {min} }{\alpha } \;. \end{aligned}$$

By using (8) in (10) and rounding to the nearest greater integer, we obtain

$$\begin{aligned} K(L) = \left\lceil \frac{2\lambda }{\alpha } \frac{H^*_{L}(\mathcal {A})}{\nu } \right\rceil . \end{aligned}$$

For \(C_\mathrm {d}\) and \(C_\mathrm {v}\) denoting the costs of a depot and a vehicle, respectively, the total infrastructure expenditure is determined by the piecewise constant function

$$\begin{aligned} C(\bar{T}) = C_\mathrm {d}\, L(\bar{T}) + C_\mathrm {v}\, K(L(\bar{T})) \;. \end{aligned}$$

For some combination of parameters, the function above increases with the targeted average delivery time \(\bar{T}\): As \(\bar{T}\) increases, the minimum number of depots \(L(\bar{T})\) decreases, and more vehicles are needed to stabilize the system. Depending on the ratio between \(C_\mathrm {d}\) and \(C_\mathrm {v}\) this may increase the total infrastructure expenditure necessary to build a stable system, denoted by \(I_\mathrm {min}\). Yet, \(I_\mathrm {min}\) has to be a non-increasing function of \(\bar{T}\). By construction, this is accomplished in the following way: For two configurations of depots with \({L'<L''}\), \({\bar{T}'_\text {min}>\bar{T}''_\text {min}}\), and \({C'>C''}\) we are able to obtain delivery time \(\bar{T}'_\text {min}\) and reduce total expenditure by employing \(L=L''\) instead of \(L=L'\) depots alongside delaying any delivery by \({\bar{T}'_\text {min}-\bar{T}''_\text {min}}\) time units. This yields

$$\begin{aligned} I_\mathrm {min}(\bar{T})= & {} \min _{l:l\in \{L(\bar{T}),\ldots ,\infty \}} C_\mathrm {d}\, l + C_\mathrm {v}\, \left\lceil \frac{2\lambda }{\alpha } \frac{H^*_{l}(\mathcal {A})}{\nu } \right\rceil \;, \end{aligned}$$

which constitutes the minimum infrastructure expenditure necessary for building a stable system that meets a targeted performance \(\bar{T}\). Note that (13) is sufficient to guarantee system stability if the job allocation policy can stabilize the system for all \(\rho <1\).

The minimum infrastructure expenditure depends on the shape of the service area \(\mathcal {A}\) through the terms \(H^*_{L}(\mathcal {A})\) and \(L(\bar{T})\). It is possible to obtain a lower bound to \({I_\mathrm {min}(\bar{T})}\) which does not depend the shape of the service area by considering

$$\begin{aligned} H^*_{L}(\mathcal {A}) \ge a\sqrt{\frac{A}{L}} \;, \end{aligned}$$

where \(a=2/(3 \sqrt{\pi })\), which is equivalent to considering circular sub-areas (Zemel 1984). If the right hand side of the equation above is used instead of the \(L\)-median function, the minimum number of depots is

$$\begin{aligned} L(\bar{T}) = \left\lceil { \frac{a^2A}{\nu ^2 \bar{T}^2} } \right\rceil . \end{aligned}$$

By using (14) and (15) in (13) we obtain the lower bound. In general, the bound is not tight but, depending on the parameters, can be very close to the actual minimum infrastructure.

We show in the following sections that one vehicle can make the difference between almost optimal performance and an unstable system (see Fig. 3). Therefore, choosing a sufficient number of vehicles and depots is crucial for a reliable system. We call this operation system dimensioning. In Sect. 7 we use (13) in a numerical example and discuss the result.

5 Simple job assignment policies

5.1 Description of policies

Table 1 Features of job assignment policies

Job assignment goes beyond merely picking the next request. It has to specify all decisions needed to operate a delivery system including which customer request to serve first, which vehicle to let serve the next customer request, at which depot to let vehicles load up goods, when to recharge the battery, and where to let vehicles return to if no customers are waiting. We first analyze two simple classes of job assignment policies: nearest job first to random vehicle (NJR) and first job first to nearest vehicle (FJN). NJR policies select jobs based on the location of the customer; FJN policies select jobs based on the arrival time of the customer requests. By comparing these two policies, we seek to gain insights into the following question: Is it worth to delay the decision on job assignment to obtain more information about new customer requests in order to reduce the delivery time? For both classes we evaluate two extreme cases: when assignment is made as soon as possible (just after the previous service, NJR-soon and FJN-soon), and when assignment is made as late as possible (just before loading the good, NJR-late and FJN-late).

A first issue is the timing of job assignments. It seems plausible to assume that decisions should be made as late as possible to utilize the most recent information about the system status, basically postponing the individual job assignments until loading goods in the depot. Such delayed decisions can also have a negative effect. This relates to the fact that, for \(L\ge 2\), one of the most important decisions to make is where a vehicle should return to after satisfying a customer request. If a non-postponed decision about the next job includes a “clever” assignment of the depot to travel to, while a postponed decision leads to being at a suboptimal place at the moment of job assignment, the advantage gained by processing more recent information is counteracted by the disadvantage of the extra distance traveled to the customer to be served next. We will investigate this effect in Sect. 5.4.

A second issue is the coordination mechanism to avoid that more than one vehicle is assigned to the same job. Such a scenario, where a particular customer, say \(c_{\hat{n}}\), is selected by \(K'\) vehicles with \({2 \le K' \le K}\), can easily happen if the system is not fully utilized, i.e., for \({\rho \rightarrow 0}\). FJN policies are centralized: A central entity selects the next job and assigns it to the nearest vehicle. Hence, the assignment is based on arrival time and localization of waiting customers, positions of depots and vehicles. NJR policies are distributed: Every vehicle selects jobs based on its position with respect to the position of customers and depots. Whenever a vehicle selects a job, it is removed from the list of waiting jobs. Every vehicle knows its own position, the positions of depots and of waiting customers. For these policies, the coordination mechanism is randomized: If more than one vehicle is ready to take a new job at a given time, a random priority order is assigned to the vehicles involved. The randomize mechanism corresponds to a real case where vehicles are not coordinated at all and the connection delay to the list of waiting customers is not predictable because it depends on the network conditions.

Table 1 shows all simple policies and classifies them according to three features: the order of jobs to be done, the timing of assignment, and the coordination of jobs in case of ambiguous assignments. A feature included in all policies is the effect of a vehicle \(v_{k}\)’s battery level, \({b_k(t)}\), on the selection of new jobs. In particular, if at the time of job selection/assignment the battery level is below \(30\%\), the vehicle approaches the nearest depot to recharge and does not select a job until its battery level has reached \(80\%\). If no service request arrives, \(v_{k}\) fully recharges.

5.1.1 NJR policies

In NJR-soon—after completing a service—the vehicle \(v_{k}\) selects the next customer such that the travel distance to this customer via a generic depot is minimized. In mathematical terms we solve

$$\begin{aligned} \min _{\begin{array}{c} l:l\in \{1,\ldots ,L\}\\ n:c_{n}\in \mathcal {N} (\tau ) \end{array}}\Vert \mathbf v _{k}(\tau )-\mathbf d _l \Vert + \Vert \mathbf d _l-\mathbf c _n \Vert \,, \end{aligned}$$

where \(\tau \) is the time instant of completion of the previous job, which is here equal to the time instant of the new job assignment. Such assignment is made if the battery level at time \(\tau \) is high enough, i.e., \({b_k(\tau ) \ge 0.3}\); otherwise a vehicle will return to the nearest depot. Every vehicle has to make \({L\cdot N (\tau )}\) comparisons to conclude an individual job assignment. If a job is at minimum traveling range for more than one vehicle, according to the random coordination described above, only one randomly chosen vehicle will serve the job.

In NJR-late, after completion of a service, the vehicle \(v_{k}\) first travels to the closest depot. The vehicle then selects the nearest customer, i.e., it solves:

$$\begin{aligned} \min _{n:c_{n}\in \mathcal {N}(\tau + R)}\Vert \mathbf v _{k}(\tau + R)-\mathbf c _n \Vert , \end{aligned}$$

where \(\tau + R\) is the time instant \(v_{k}\) arrives at the depot. The vehicle remains in the depot if there is no waiting customer. If a job is at minimum traveling range for more than one vehicle, the supplying vehicle is determined randomly. This policy requires \(L+ N (\tau +R)\) comparisons.

5.1.2 FJN policies

Following a first come first served (FCFS) policy, a central unit always selects the “oldest” unaddressed request, i.e., the smallest n with \({c_{n} \in \mathcal {N} (t)}\), where t indicates the instance of the job assignment. Let this request be called \(\underline{n}(t)\). In particular, policy FJN-soon implies that whenever at least one vehicle completes a service at time \({t=\tau }\) a central controller selects the next customer in line, i.e., request \(\underline{n}(\tau )\), and chooses the vehicle \(v_{k^* }\) and the path through the depot, \(d_{l^*}\), that minimizes the total distance between its current position \(\mathbf v _{k}(\tau )\), a depot’s position \(\mathbf d _l\), and the customer’s position \(\mathbf c _{\underline{n}(\tau )}\):

$$\begin{aligned} \min _{\begin{array}{c} l:l\in \{1,\ldots ,L\}\\ {k:k\in \mathcal {K'}} \end{array}}\Vert \mathbf v _{k}(\tau )-\mathbf d _l \Vert + \Vert \mathbf d _l-\mathbf c _{\underline{n}(\tau )} \Vert , \end{aligned}$$

where \(\mathcal {K'}\) is the number of vehicles ready for the service. Obeying (18), a total of \(K' \cdot L\) computations is needed to come up with a vehicle’s individual assignment of whom to serve next.

Obeying the rules of policy FJN-late, whenever at least one vehicle is available at a depot, the central unit selects the request \(\underline{n}(\tau +R)\) and performs \(K'\) computations to determine the vehicle nearest to that job. Subsequently, \(L\) computations are performed to find the depot closest to the selected customer where the vehicle has to return to after delivery:

$$\begin{aligned} \min _{k:k\in \mathcal {K'}}\Vert \mathbf v _{k}(\tau +R)-\mathbf c _{\underline{n}(\tau +R)} \Vert + \min _{l:l\in \{1,\ldots ,L\}}\Vert \mathbf c _{\underline{n}(\tau +R)} - \mathbf d _l \Vert . \end{aligned}$$

5.2 Simulation setup

For each of the simple policies, we simulate the movement of \({K\in \{1,\ldots ,24\}}\) vehicles in a square area of \({A=16 \text{ km }^2}\). The \({L\in \{1,4,9,16\}}\) depots are located such that the average distance of any potential request in \(\mathcal {A}\) from the nearest depot is minimized. Customer requests are assumed to randomly arrive over time at a constant rate \({\lambda =0.65\, \text{ requests/min }}\). All vehicles travel at a constant velocity of \(\nu =30 \text{ km/h }\), neglecting acceleration and deceleration phases and neglecting the extra time for starting and landing. The vehicles’ air time is limited due to their finite battery capacity, which is assumed to be full at \(t=0\). We choose the ratio between air time and charge time as being 1 / 3, i.e., \({\alpha =0.25}\). Moreover, every \(v_{k}\) is placed at one of the depots at \(t=0\). Finally, we neglect the loading time of goods.

In the simulations, the system presented in Sect. 3 is observed every \(\delta \) time units. Interarrival times are generated according to a geometric distribution so that the number of customer arrivals per time unit follows a binomial distribution with parameters \(h=1/\delta \) and \(p=\lambda \delta \). The Poisson distribution is a sufficient approximation of the binomial distribution if \(p \le 0.08\) and \({h\ge 1500 p}\) (Bronshtein et al. 2007). Therefore, for all of our simulation runs, we choose \(\delta \) such that \({\delta \le \min \{0.08/\lambda ,1/\sqrt{1500\lambda }\}}\).

The data collected from the simulations are used to compute the average delivery time. Specifically, we firstly estimate the length of the warm-up phase using Welch’s method (1983). Secondly, we compute the average delivery time with the replication/deletion method (Law 2007), which is equivalent to the independent replications method (Welch 1983). These methods are used for statistical analysis of data and enable us to estimate expected values and confidence intervals.

For every parameter setup, we perform 10 simulations with a minimum of 4000 jobs for every simulation. If it is impossible to evaluate whether the system reaches the steady state within this period, we simulate 10, 000 customers requests which is more than 10 days of continuous operation.

5.3 Warm-up phase

We discuss the computation of the warm-up phase for NJR-soon and FJR-soon for \({K\in \{11,12,14,16\}}\) vehicles and \({L=16}\) depots. Figure 2 reveals that the length of the transient phase decreases with an increasing number of vehicles, i.e., with a decreasing load factor. We conclude that delivery systems provided with fewer vehicles must be simulated for a longer period (higher number of demands \(\tilde{n}\)) to reach steady state conditions. Being conservative, we estimate that the length of the warm-up phase is \({\tilde{n}_{\mathrm {wu}}=3000}\) for the case \(K=11\), while being \({\tilde{n}_{\mathrm {wu}} =2000}\) for the case \(K=12\), and \({\tilde{n}_{\mathrm {wu}} =500}\) for all other cases.

Fig. 2
figure 2

Delivery time averaged over 10 simulation runs and \({2\tilde{w}+1}=1001\) demands vs. demand index n for \(K\) vehicles; \({L=16}\), \({A=16 \text{ km }^2}\), \({\lambda =0.65 \text{ req./min. }}\), \({\nu =30 \text{ km/h }}\), and \({\alpha =0.25}\)

Fig. 3
figure 3

Expected delivery time (90% confidence) versus number of vehicles for \(L\) depots; \({A=16 \text{ km }^2}\), \({\lambda =0.65 \text{ req./min. }}\), \({\nu =30 \text{ km/h }}\), and \({\alpha =0.25}\). a Number of depots \(L=1\). b Number of depots \(L=4\). c Number of depots \(L=9\). d Number of depots \(L=16\)

Moreover, Fig. 2 introduces a result addressed when discussing Fig. 3d below: FJN-soon can yield a substantially better system performance than NJR-soon. However, FJN-soon cannot stabilize the system for \(K=11\).

5.4 Performance evaluation and lesson learned

Figure 3 shows the average delivery time \(\bar{T}\) for different job assignment policies as a function of the number of vehicles \(K\) and the number of depots \({L}\). The shaded areas indicate “impossible regions”, i.e., all ordered pairs \((K,\bar{T})\) for which either \(\rho > 1\) or \(\bar{T}<\bar{T}_\mathrm {min}\) or both. The following basic phenomena can be observed.

5.4.1 Coordination mechanism and low load

FJN policies are optimal in light load, i.e., the best vehicle to serve a request (in order to minimize the delivery time) is the vehicle nearest to that request. If the system is in light load, increasing the number of vehicles \(K\) has no effect on the performance. Additional vehicles are not used because, on average, there is a vehicle ready to serve every new request from the nearest depot. As the load factor increases, FJN policies become unstable for arrival rates \(\lambda \) rather far from the theoretical limit. This is probably due to the non-optimal battery management. In high load, vehicles spend a significant amount of time recharging the battery while the customers are waiting. Therefore, a good policy has to include the battery level in the job allocation decision.

The randomized coordination mechanism of NJR policies performs poorly in light load. Even if there are vehicles available a the depot nearest to a new request, the serving vehicle might come from another depot. This increases the minimum average service time and dramatically degrades performance.

5.4.2 Tipping point behavior

As the number of vehicles decreases, FJN policies show a tipping point behavior: One vehicle makes the difference between almost optimal performance and instability. In FJN policies, as the number of vehicles decreases beyond a certain threshold, the vehicles often have to change depot. Such depot switching increases the minimum service time making the system unstable. The phenomenon is made worse by the non-optimal battery management. In terms of queuing theory, this extreme behavior is explained by the fact that the delivery time grows much faster with the parameters defining the load factor (\(\lambda \), \(\bar{B}\), \(\alpha \), \(K\)) than in “classic” queues. Bertsimas and van Ryzin report the same for waiting time and arrival rate (Bertsimas and Ryzin 1991). In a “classic” queue, a decrement in the number of vehicles \(K\) would increase the load factor according to \({\lambda \bar{B}/(\alpha (K-\Delta K))}\). In our problem, the average time to process a request \(\bar{B}\) increases as the number of vehicles decreases beyond a certain threshold. Therefore, the load factor increases according to \({\lambda (\bar{B}+\Delta B)/(\alpha (K-\Delta K))}\), which determines higher increment on the delivery time. Given this relationship between delivery time and number of vehicles and the fact that the number of vehicles is discrete, one vehicle can make the difference between being in mid/light load and exceeding the stability region. The difference in stability of FJN and NJR policies derives from the differing ability to keep \(\bar{B}\) small as the load factor increases.

Despite the random coordination, NJR-soon performs better than the FJN policies in high load. In high load, there are fewer conflicts, and assignment of the nearest job limits the transitions between depot sub-areas (with high probability there are requests in the same sub-area). However, this capacity to stabilize the system for higher arrival rates comes at the price of lack of fairness. Requests are treated with priorities depending on the distance from the depot. Therefore, there is a set of demands that is served with long delivery time.

5.4.3 Timing matters

Using FJN policies, timing has no effect on delivery time and system stability (see Fig. 3). In contrast, using NJR policies, an early decision is beneficial for stability, i.e., NJR-soon is more robust than NJR-late. The positive effect of more recent information outweighs the negative effect of returning to a depot located sub-optimally (on average) for the next job. The gap between the two NJR policies widens with more depots, i.e., the effect of “being at a suboptimal place” is more pronounced in systems with many depots. This happens because in heavy load conditions randomized job coordination is rarely necessary (if at all), and NJR behaves similar to a nearest neighbor policy (as mentioned above) which serves the nearest demand after every service completion.

6 Workload-based job assignment

6.1 Description of policies

We now study two policies that take into account the workload of vehicles. The first is known from queuing theory (Asmussen 2003); the second is an extension based on the lessons learned from the analysis of simple policies. The workload at time t is the remaining time that a vehicle needs to complete all assigned jobs.

The first policy is first job to the vehicle with the smallest workload (\(\text{ FJW }\pi \)). It serves jobs in an FCFS order and assigns them to the vehicle with the smallest workload. At every job arrival, a central unit asks each vehicle to compute its current workload. All vehicles send back their computed values and the central unit assigns the job to the vehicle with the minimum value. A vehicle with no workload heads toward the nearest depot to charge its battery. For a standard G / G / K queue, this policy minimizes all moments of the waiting time, number of jobs in the system (queue length), and workloads for all vehicles (Asmussen 2003). In our delivery system, however, the policy performs poorly because it does neither consider the positions of the vehicles relative to the customer and depots nor the energy needed to do the job.

The second policy is first job to the vehicle with the smallest additional workload (\(\text{ FJW }\delta \)). Instead of using the current workload of a vehicle as an assignment criterion, this policy employs the amount of workload that the new job adds to the current workload. The key idea behind this policy is to minimize the overall workload in the entire system in the long run. \(\text{ FJW }\delta \) considerably increases stability and reduces delivery time. The additional workload takes into account the way from the previous customer to the depot in order to pick up the new good (or the current location if the vehicle is not serving), battery charging in the depot (to reach the new customer and come back to the nearest depot), and transport the good from the depot to the customer. If a vehicle is currently not serving a customer, the current location is taken instead of the customer location.

It should be emphasized that the policy is scalable with the number of vehicles and depots: Each vehicle computes the minimum of \(L\) additional workload values to choose the depot for every assignment, and the central unit computes the minimum of \(K\) additional workload values to choose the vehicle. The policy can thus be applied in large systems. Alternatively, it can be applied locally by dividing the multi-depot system in many single-depot systems.

Some design choices of \(\text{ FJW }\delta \) can be traced back to the lessons learned from the simple policies: First, we focus on FCFS to have fairness among requests with varying distances from the nearest depot. Second, we explicitly take into account the battery level in job assignments. Third, assignments can be postponed without impact on performance until the workload of the chosen vehicle is zero. This is important from an implementation point of view as the system can be well operated in areas with poor network coverage. Vehicles can exchange information with the central unit whenever they are intermittently connected. Note that good assignments are especially important in systems with high load; since vehicles in high load have on average a long time to reach a workload of zero, the communication can in fact be postponed over a longer period. In this way, \(\text{ FJW }\delta \) combines the best of two NJR policies: NJR-soon has good performance but strict connectivity requirements; NJR-late has relaxed connectivity requirements but bad performance.

6.2 Performance evaluation

Fig. 4
figure 4

Expected delivery time (90% confidence) for workload-based job assignment policies; \({L=4}\), \({K=12}\), \({A=16 \text{ km }^2}\), \({\nu =30 \text{ km/h }}\), and \({\alpha =0.25}\)

Figure 4 shows the expected delivery time over the job arrival rate for FJW policies. \(\text{ FJW }\pi \) is suboptimal in low load (similar to NJR) and cannot stabilize the system for high load. This is because the policy assigns the job to the vehicle that will be ready for service first but which might be far away from the customer. In contrast, \(\text{ FJW }\delta \) is optimal in low load and performs very well in high load. It is optimal in low load because the chosen vehicle typically comes directly from the depot nearest to the customer it will serve.

A necessary condition to have an optimal policy is that it performs at least as good on a global level as on the local level. This is true in our case: \(\text{ FJW }\delta \) applied at local level performs worse than at global level for mid load. The reason is as follows: In mid load, there is a benefit in exchanging vehicles between different depots, so that vehicles from depots with few jobs can help in depots with many jobs. As the load increases, all depots have many jobs, thus vehicles tend to stay at their depot.

It is of interest for dimensioning and controlling the system to investigate whether \(\text{ FJW }\delta \) can stabilize the system for all possible load factors \(\rho \), defined by starting from (7) considering \({\bar{B}\ge 2\bar{S}_\text {min}}\) (in high load) and using (8):

$$\begin{aligned} \rho = \frac{\lambda }{\alpha K} \frac{2 H^*_{L}(\mathcal {A})}{\nu }\;. \end{aligned}$$

Simulation results suggest that this is true if the number of vehicles per depot \(K/L\) is large enough. Figure 5 shows the maximum load factor for which the policy is able to keep the system stable:

$$\begin{aligned} \rho _\mathrm{stable} = \sup _{0\le \rho \le 1} \left\{ \lim _{t \rightarrow \infty } \mathbb {E}[N (t)| \rho , \text{ FJW }\delta ]\le \kappa \right\} \end{aligned}$$

with some \(\kappa < \infty \). Let us interpret the results in more detail. If there are many vehicles per depot, the average return plus service time in high load depends on the distance between customer and nearest depot, i.e., is \(2H^*_{L}(\mathcal {A})/\nu \). With decreasing number of vehicles, to serve customers in an FCFS order, vehicles do not stay in the same depot area but change depots. Once this occurs, the average return time increases and the system soon becomes unstable. Curve fitting suggests that the value of the stability threshold \(\rho _\mathrm{stable}\) increases with \(K/L\) according to an exponential law:

$$\begin{aligned} \rho _\mathrm{stable} = 1- e^{-\gamma K/L} \, \end{aligned}$$

with \(\gamma = 2.73\) in this case. Due to this law, only a few vehicles per depot are needed to stabilize the system for almost all load factors. For example, a load of \(\rho _\mathrm{stable}=90\%\) can be carried with only one vehicle per depot.

Fig. 5
figure 5

Maximum load factor reachable by the policy FJW\(\delta \) as function of the number of vehicles per depot; \({A=16 \text{ km }^2}\), \({\nu =30 \text{ km/h }}\), and \({\alpha =0.25}\)

Results of Fig. 5 are obtained as follows: For every given combination of \((L,K)\), we find the stability threshold \(\rho _\mathrm{stable}\) by visual inspection of \(\bar{T}(\lambda )\)-plots. Each point on the \(\bar{T}(\lambda )\)-plot is obtained by simulating the system ten times over 100,000 jobs. In all scenarios, depots are located to minimize the \(L\)-median function.

A comparison of \(\text{ FJW }\delta \) with the simple policies NJR and FJN confirms these findings (see Fig. 3). \(\text{ FJW }\delta \) outperforms both in all cases except in the configuration \((K,L)=(7,16)\), where NJR-soon seems to have a lower average delivery time. \(\text{ FJW }\delta \) is still preferable because of fairness issues with NJR-soon. Even though \(\text{ FJW }\delta \) works very well, when looking at the system from the dimensioning point of view, the tipping point behavior cannot be eliminated. Therefore, a reliable system inevitably calls for careful dimensioning.

7 Dimensioning

For this reason we propose a method for the systems dimensioning: selection of the number of vehicles and depots. We have seen that the number of depots must be chosen in relation to the size of the service area, and that this long-term choice on depot infrastructure must be coordinated with the short-term choice on vehicles. For company-specific parameter values, a diagram like Fig. 6 translates the insights derived in the subsection above into the monetary domain. It relates a company’s expenditure I for depots and vehicles to average delivery time \(\bar{T}\). The purpose of such a plot is to provide decision making support for companies that set up an airborne delivery system equipped with small UAVs.

Fig. 6
figure 6

Relation between infrastructure expenditure and average delivery time; \({A=16 \text{ km }^2}\), \({\lambda =0.65 \text{ req./min. }}\), \({\nu =30 \text{ km/h }}\), \({\alpha =0.25}\), \({C_v=2000 \text{ US }\$}\), \({C_d=\hbox {20,000} \text{ US }\$}\)

To give an example, Fig. 6 is produced on the assumption that the cost of a UAV suitable to deliver two-kilogram packages is 1000 US$ plus a maintenance cost of 100 US$ per annum, and the cost of a depot is 15, 000 US$ plus a maintenance cost of 500 US$ per annum. Operating the system over ten years, the costs per vehicle and depots are \(C_v=2000\) US$ and \(C_d=20{,}000\) US$, respectively. These parameter values are those assumed by the company Matternet (Raptopoulos 2012).

A lower bound on the expenditure required to build a stable system, denoted by \(I_\text {min}\), is derived in Sect. 4 and is given by (13). Figure 6 plots this bound for the given parameters, where all values below the bound are shown as a shaded area. The bound has a “staircase” shape with tread levels being the minimum average delivery time achievable with a particular number of depots. No operable system exists for parameters in this area, while every combination of I and targeted \(\bar{T}\) located above fulfills \({\rho <1}\) and \({\bar{T}> \bar{T}_\mathrm {min}}\). The bound corresponds to a service possibility frontier; it gives a necessary but not sufficient condition for infrastructure expenditure. The actual performance is policy dependent, and more financial resources than \(I_\text {min}\) may be needed to operate the system in a stable manner and to meet the targeted performance. The performance of the FJW\(\delta \) policy is plotted for scenarios with a varying number of depots.

A company that wants to operate a delivery service and serve a customer within a certain average delivery time can employ such a diagram as follows: If the average delivery time should be no more than \(\tau \), the company has to look for feasible combinations of infrastructure and stabilizing policies, i.e., squares and triangles, that are located as close as possible to the origin and below the \((\bar{T}=\tau )\)-line. If the customers’ willingness to pay for several levels of service quality is given, it is possible to quantify the company’s marginal revenues of increasing performance. From Fig. 6 we know the marginal cost of decreasing delivery time. Then we are able to determine the infrastructure that maximizes the company’s profit. For the parameters of Matternet we find that a system with \(L=1\) depot can serve a customer in about 3 min on average in an area of \(\text{16 } \text{ km }^2\), which comes at a cost of 60,000 US$. This time can be reduced to less than 1.5 min if the company spends more than 100,000 US$ for infrastructure (associated with \(L=4\) depots). If the company’s marginal revenue of reducing delivery time by 1.5 min (50 %) is larger than approximately 40,000 US$, the four-depot configuration is better than the one-depot configuration. In other words, Figure 6 informs about the financial resources required for achieving a certain quality of service with a certain policy and about the volume of additional financial resources obligatory to “buy” a shorter delivery time.

8 Conclusions

This article addresses the high-level control and dimensioning of a drone-based delivery system using simulations and queuing theory. It was found that job assignment policies can experience a tipping point behavior: A stable system could immediately become unstable if one vehicle fails. An advanced job assignment policy is proposed that uses the increment in workload as assignment metric: The job is assigned to the vehicle that will do the job faster than other vehicles. This policy, called first job first to vehicle with the smallest additional workload, leads to an optimal average delivery time for low loads and works very well up to high loads. It is scalable with the number of depots and vehicles. Simulation results indicate that the policy stabilizes the system for all loads if the number of vehicles per depot is sufficient. To account for the tipping point behavior we show how to dimension a stable delivery system for resolving the trade-off between expenditure and service quality. The dimensioning considers two time horizons: long-term decisions on the number of depots to deploy in the service area and short-term decisions on the number of vehicles to use. Future work will analyze systems with inhomogeneous customers and real-world data.