1 Introduction

Last-mile deliveries in urban areas are becoming increasingly important, as highlighted, for example, in Boysen et al. (2020). With a given set of demand points, the goal is to determine routes for delivery vehicles that cover all demand points and minimize a generalized cost function. To avoid straining the urban infrastructure, even more, fast, efficient and environmentally friendly solution approaches are needed. Thus, recent approaches focus on supplementing truck deliveries with additional transport modes with a lower environmental impact that is well suited in an urban setting. Some approaches include drones, other small autonomous vehicles or cargo bikes. Here, each kind of vehicle has its own properties and restrictions, resulting in differing modeling approaches. Drones, e.g., have a very limited capacity, but can be transported by a delivery truck, while cargo bikes have a significantly higher capacity, but usually cannot be transported themselves. By distributing deliveries to multiple vehicles, the need to hand over packages from one (larger) vehicle to the next (smaller) one arises. This can either take place at dedicated hubs, which might also serve as mini-depots, or at any location along the route of the larger vehicle. Introducing hubs as mini-depots allows for scheduling the varying delivery vehicles almost independently, but requires allocating suitable locations and implies building costs. Alternatively, handing over deliveries along the route requires synchronization between the routes of different vehicles, which increases flexibility in response to demand fluctuations, but results in a more involved model.

In this paper, we model and solve the problem when deliveries are combined between a truck and a cargo bike, and goods can be handed over at any stop on the route. To be more precise, we have a set of customers, each with a specific demand, a truck depot used as a starting point for a delivery truck and storing all packages corresponding to the customer’s demand, as well as a bike depot that serves as the starting point for one or multiple cargo bikes. Now, each vehicle supplies a subset of customers such that all demands of the complete set of customers are fulfilled. Due to the limited capacity of the cargo bike and the fact that the truck transports all packages, the bike has to meet up with the truck regularly to be restocked. This can be done at any customer location while the truck supplies the corresponding customer. Consequently, the objective is to minimize the resulting generalized costs depending on time or distance. We illustrate the concept of combine truck-and-cargo-bike tours in Example 1.

Example 1

Figure 1 shows a truck tour (solid edges) and a bike tour (dashed edges). Both tours start and end at the respective depots. (Black house in the middle corresponds to the cargo bike depot, and truck depot is on the left.) All other nodes represent customer locations with unit demand. As the cargo bike has a capacity of two, bike and truck tours are synchronized accordingly, such that the first and the fifth customer location of the truck tour serve as handover locations.

Assuming unit edge length, the truck tour has a length of nine, while the bike tour has a length of eight due to the synchronization at the second handover location. The completion time, i.e., the time until both vehicles have returned to the depot, is nine.

Fig. 1
figure 1

Example of truck (solid edges) and bike tour (dashed edges) with bike capacity two and two handover locations

As no dedicated infrastructure has to be built for this delivery setting, this method can easily be implemented and is robust against changing demand. The main contributions of the paper can be summarized as follows:

  • we formulate the main problem and introduce MIP formulations for several variants with different cost functions, objectives and multiple cargo bikes,

  • we discuss the relationship to the traveling salesperson problem (TSP) as well as the capacitated vehicle routing problem, analyze the potential improvements compared to the TSP and provide approximation bounds,

  • we propose three solution algorithms based on clustering, savings and reinforcement learning,

  • and we evaluate the performance of all algorithms and MIP formulations on artificial and close-to-real-world instances.

In Sect. 2, we discuss the relevant literature and differentiate our setting from existing papers. We model the problem as a mixed-integer program in Sect. 3 and introduce several possible objective functions as well as problem variants. Three heuristic solution approaches are introduced in Sect. 4 and evaluated experimentally on artificial and close-to-real-world instances in Sect. 5. Here, we analyze how the instance size, the bike capacity and the speed differences between the vehicles influence the solution process and the structure of the solutions. Section 6 concludes the paper.

2 Literature review

City logistics has gained increasing attention in recent years as urbanization continues to intensify. It typically refers to the planning, organization and optimization of freight transportation and logistics activities within urban areas and includes the movement of goods, services and information (Bektas et al. 2015). Addressing the complex interactions between various stakeholders such as suppliers, carriers, retailers and consumers, the integration of advanced technologies, such as intelligent transport systems, big data and artificial intelligence, plays a crucial role in more recent approaches (Taniguchi et al. 2020). In this work, we address some of the arising key challenges, including environmental sustainability, congestion and traffic management, and last-mile delivery (Anand et al. 2012; Russo and Comi 2020), with a focus on the latter.

There are many different types of last-mile delivery concepts in the literature; for an overview, we refer to Boysen et al. (2020). Often, these concepts are variants or extensions of the traveling salesperson problem (TSP) (Jünger et al. 1995; Applegate et al. 2011) or vehicle routing problem (VRP) (Toth and Vigo 2014). Some of those include time windows (Desrochers et al. 1992), backhauls (Goetschalckx and Jacobs-Blecha 1989), split deliveries (Dror and Trudeau 1990), stochastic demands (Bertsimas 1992), stochastic presence of customers (Gendreau et al. 1995) or stochastic travel times (Laporte et al. 1992; Malandraki and Daskin 1992). In other variants, a mixed fleet is considered, called heterogeneous fleet VRP (Baldacci et al. 2008), where the types of vehicles can differ in capacity, speed, variable and fixed costs, and the customers that they can access. For example, the truck-and-trailer routing problem (Chao 2002; Lin et al. 2011) involves managing a fleet comprising at least two vehicle types: normal trucks without trailers and truck-and-trailer combinations. While the latter is attractive due to its larger overall capacity, some customers can only be reached by a normal truck. A more general version of this is the VRP with Trailers and Transshipments (Drexl 2013) where there is no fixed assignment of trailers to trucks.

Another important variant is the pickup and delivery problem (Berbeglia et al. 2007), which can be expanded by integrating a fixed route service as in Ghilas et al. (2016). Here, requests need to be scheduled involving pickup and drop-off locations with specified time windows and maximum ride times. The orders can be fulfilled by a delivery truck alone or be supplemented by a fixed route service, meaning that it can cover a part of the route and make use of spare capacity. The goal is to allocate vehicles, plan routes efficiently, and meet capacity and time window constraints while considering both trucks and fixed route services.

In the vehicle routing problem with cross-docking (Wen et al. 2009), a given set of pickup and delivery requests is addressed by a fleet of identical vehicles starting and ending at a cross-dock. They are used to drive both a pickup route to collect goods from suppliers and transport them to the cross-dock and, after unloading and reloading, a delivery route to deliver them to the corresponding customers.

The authors of Zäpfel and Bögl (2008) tackle the problem of local letter mail distribution. This involves simultaneous vehicle and driver routing and scheduling, taking into account constraints on working and driving times and considering the option of outsourcing vehicle routes to external carriers. Here, delivery routes, transporting shipments from the distribution center to the local post offices, as well as separate pickup routes for outbound shipments, have to be planned during one week, allowing the reuse of drivers and vehicles across multiple routes.

Our proposed approach corresponds to the field of vehicle routing problems with multiple synchronization constraints. Besides the assignment of customers to the supplying vehicle, additional synchronization requirements in terms of time, location and load are necessary to model the related problems. Typically, there are different types of autonomous and non-autonomous vehicles, capacities, tasks or loads, and locations, such as customers, transfer locations or depots. For interested readers, we refer to Drexl (2012). The main difference between our approach to those in Drexl (2012) lies in the joint consideration of large (truck) and small (cargo bikes) independent vehicles, both supplying customers that can serve as transfer locations and their synchronization in time.

Our model is related to both models for routing trucks and drones and two-echelon routing problems. Table 1 gives an overview of similarities and differences.

The flying sidekick traveling salesman problem (FSTSP) is introduced in Murray and Chu (2015) and consists of optimal customer assignments for a drone supporting a delivery truck. In each drone subtour, starting at the depot or from a customer location after synchronizing with the truck, the drone supplies exactly one customer within the limited flight endurance. For longer distances, recharging or to conserve battery power, the drone can be transported by the truck. In order to minimize the time required to serve all customers, the authors proposed an MIP formulation as well as a route and re-assign heuristic. Another simple greedy heuristic to solve FSTSP is presented in Crişan and Nechita (2019). Starting from a TSP tour, nodes are assigned to the drone in descending order of the corresponding time savings.

The authors of Agatz et al. (2018) propose the TSP with drone (TSP-D), a similar concept to FSTSP. Here, the truck can visit customers more than once as this could be useful for reloading or transporting the drone and can wait for the drone to return to the same node where the drone started. They present an operation-based IP formulation, where an operation represents part of a tour that contains at most one drone node. Since the number of operations grows exponentially with the number of nodes, the IP can only be solved for small instances, and the authors introduce a route-first cluster-second heuristic. In Bouman et al. (2018), the same authors presented an improved method as well as a variant that considers a subset of operations in order to reduce the computational time at the expense of accuracy.

Two alternative MIP formulations to the FSTSP are provided in Schermer et al. (2020). In fact, the authors propose a third MIP formulation, partly based on the concept of operations introduced in Agatz et al. (2018), with an exponential number of constraints and use this formulation for a branch-and-cut approach.

A slightly different concept regarding the combination of trucks and drones is pursued in Amorosi et al. (2021). Instead of visiting nodes in the graph, a given percentage of the edge lengths of a set of graphs has to be inspected (i.e., visited) by a drone. To address this problem, the authors propose a nonlinear MIP formulation and a matheuristic.

In two-echelon routing problems (a special case of multi-echelon vehicle routing problems (Gonzalez-Feliu et al. 2008; Perboli et al. 2011)), a distribution network with two levels (echelons) is considered; see (Cuda et al. 2015; Sluijk et al. 2023). Here, the primary vehicles start from a subset of predefined depots and transport goods to a subset of predefined handover locations, called satellites, with their own capacities. From there, the secondary vehicles deliver the goods to the customers. While in some formulations, the subsets of depots and satellites have to be selected (e.g., Contardo et al. 2012), in others, one (Nguyen et al. 2012) or both (Hemmelmayr et al. 2012) are predefined. There are also variants where both vehicles have to be at a satellite location at the same time, as there is no way to store the goods (Grangier et al. 2016). A variant of the two-echelon routing problem concerning trucks and cargo bikes, similar to the concept we introduce, can be found in Anderluh et al. (2017). As in our concept, both vehicles start at their corresponding depot, deliver packages to customers and require synchronization in time and location to reload the cargo bike. The main difference to our approach lies in the fact that the truck and bike nodes are predefined in Anderluh et al. (2017), reducing the complexity. Moreover, the possible locations to reload the cargo bike differ from the customer nodes and are determined a priori. In Anderluh et al. (2021), the authors extend the aforementioned model by allowing a so-called gray zone in which the corresponding customers can be supplied by both types of vehicles.

Table 1 Overview of the related literature

In Appendices B.2 and B.1, we discuss the relationship to truck-and-drone models as well as two-echelon routing models in more detail by showing how our model can be adapted to the problem described in Murray and Chu (2015); Anderluh et al. (2017).

3 Model

To introduce our problem formally, we use the following notation. Let \(G'=(V,E')\) be a digraph, where \(v_{t} \in V\) and \(v_{b} \in V\) represent the truck depot and the bike depot, respectively. The remaining nodes \(v_1,..., v_n \in V\) serve as the customer locations with demand \(d(v_i) \in \mathbb {R}_{\ge 0}\) (w.l.o.g. \(d(v_i) > 0\)) for all \(i \in [n ]\). We set \(d(v_t) = d(v_b) = 0\) for completeness. Since we only use one truck, we assume that \(\sum _{v \in V} d(v) \le C_t\), where \(C_t\) denotes the truck capacity and \(C_b\) the bike capacity. Note that the demand at each node has to be served completely by one vehicle.

For each edge \(e = (v_i,v_j) \in E'\), we have truck \(c^t(e) = c^t(v_i, v_j)\) and bike weights \(c^b(e) = c^b(v_i, v_j)\) corresponding to the costs of the truck and the bike to travel from \(v_i\) to \(v_j\), respectively. To simplify the notation for the remainder of the paper, we construct a complete digraph \(G=(V,E)\) from \(G'\) and define \(c^t(x, y):= c^t(sp^t(x, y))\) and \(c^b(x, y):= c^b(sp^b(x, y))\). Here \(sp^t(x,y)\) denotes the shortest path from x to y (in terms of truck costs \(c^t\) in \(G'\)) and \(sp^b(x,y)\) the shortest path in terms of bike costs \(c^b\) in \(G'\). If there exists no path between two nodes \(x'\) and \(y'\) for one of the two vehicles, we define \(c^t(x', y'):= \infty\) or \(c^b(x', y'):= \infty\). As a consequence, both cost functions \(c^t\) and \(c^b\) in the modified graph G satisfy the triangle inequality. To transfer a solution back to the original graph \(G'\), we store the corresponding shortest paths.

We call \((\mathcal {T}, \mathcal {B})\) a solution to the combined logistics problem, where \(\mathcal {T} = (t_0 = v_t, t_1,..., t_k = v_t)\) denotes the truck tour with start and end node \(t_0 = v_t\) and \(\mathcal {B} = (b_0 = v_b, b_1,..., b_l = v_b)\) the cargo bike tour. For the following notation, we assume that both tours \(\mathcal {T}\), \(\mathcal {B}\) are non-empty.

In particular, there must be at least one node \(v \in V\) with \(0 \ne d(v) \le C_b\); otherwise, the bike cannot be used. In addition, to call \((\mathcal {T}, \mathcal {B})\) a feasible solution, further properties are required.

The combined tour \((\mathcal {T}, \mathcal {B})\) has to cover the demand of all nodes \(v \in V\), i.e., each node has to be served by either the bike or the truck. Nodes visited by both vehicles, called combined nodes (\(\kappa _1,...,\kappa _m\)), are supplied by the truck and used to reload the cargo bike with the required goods. If \(v_t \ne v_b\), the bike starts without any goods. Therefore, the first node visited has to be a combined node (i.e., \(b_1 = \kappa _1\)). Since the cargo bike has only a limited capacity, the summed demands of the nodes between two successive combined nodes on the bike tour \(\mathcal {B}\) may not exceed \(C_b\).

Problem 2

Let a complete digraph \(G=(V,E)\) with edge costs \(c^t(e), c^b(e)\), \(e \in E\) for truck and cargo bike be given, as well as a bike capacity \(C_b\) and demand d(v), \(v \in V\), with no demand at the depots. The combined truck-and-cargo-bike routing problem (CTBRP) is to find a feasible combined tour \((\mathcal {T}, \mathcal {B})\) such that the generalized costs of the combined tour are minimized.

3.1 Cost structure

From the practical application, there are many different interpretations of the generalized costs that might be interesting to consider, especially concerning tour durations and distance covered. We, therefore, introduce various kinds of objective functions in Sect. 3.2. From a theoretical standpoint, the most important distinction is between independent costs and synchronized costs.

  1. 1.

    Independent costs: for some objectives, the costs of the truck tour and the bike tour can be computed separately, distributed to the edges as \(c^t(e)\) and \(c^b(e)\), respectively. This includes distance-based costs, i.e., a weighted sum of the distance covered by the truck and the cargo bike, but also emission-based costs.

  2. 2.

    Synchronized costs: when the duration of one or both tours is minimized, it does not suffice to model the objective independently. At each combined stop, the synchronization of both tours has to be guaranteed, i.e., the time since the last combined stop has to be long enough for both the truck and the cargo bike to serve all intermediate stops.

For independent costs, we define the costs of the truck and cargo bike tour as

$$\begin{aligned} c^{t}(\mathcal {T}) := \sum _{i=0}^{k-1} c^t(t_{i}, t_{i+1}) \quad \text {and} \quad c^{b}(\mathcal {B}) := \sum _{i=0}^{l-1} c^b(b_{i}, b_{i+1}). \end{aligned}$$

For synchronized costs, we take into account that the vehicles have to wait for each other at the combined nodes. Subsequently, the duration between two successive combined nodes \(\kappa _i\), \(\kappa _{i+1}\) on the corresponding tour is the costs of the slower vehicle (the vehicle with the higher summed costs) between the two nodes. To determine the costs of a tour, we need to sum up these durations between all successive combined nodes, as well as the summed costs (of the corresponding vehicle) between the last combined node \(\kappa _m\) and the depot. To describe this, we extend the notation and define the truck costs of a path from node \(t_i\) to node \(t_j\) (\(i<j\)) regarding tour \(\mathcal {T} = (t_0 = v_t, t_1,..., t_k = v_t)\) by

$$\begin{aligned} c^t(t_i, t_j, \mathcal {T}):= \sum _{x = i}^{j-1} c^t(t_x, t_{x+1}). \end{aligned}$$

An equivalent definition applies to the bike costs regarding \(\mathcal {B}, b_i\) and \(b_j\) (\(i<j\)):

$$\begin{aligned} c^b(b_i, b_j, \mathcal {B}):= \sum _{x = i}^{j-1} c^b(b_x, b_{x+1}). \end{aligned}$$

Now we have

$$\begin{aligned} \hat{c}^t(\mathcal {T}):= \sum _{i=0}^{m-1} \max \left\{ c^t(\kappa _i, \kappa _{i+1}, \mathcal {T}), c^b(\kappa _i, \kappa _{i+1}, \mathcal {B}) \right\} + c^t(\kappa _m, v_t, \mathcal {T}), \end{aligned}$$
(1)

and similarly, for

$$\begin{aligned} \hat{c}^b(\mathcal {B}):= \sum _{i=0}^{m-1} \max \left\{ c^t(\kappa _i, \kappa _{i+1}, \mathcal {T}), c^b(\kappa _i, \kappa _{i+1}, \mathcal {B}) \right\} + c^b(\kappa _m, v_b, \mathcal {B}). \end{aligned}$$
(2)

We overload the notation and use \(\kappa _0:= v_t\) in \(c^t\) and \(\kappa _0:= v_b\) in \(c^b\).

The preceding definition of the synchronized costs assumes that both vehicles start at the same time. This is useful in objective functions minimizing the delivery period, also known as completion time in the literature (e.g., Agatz et al. 2018). If we want to minimize the longest tour, the former formulation needs a slight modification. We can assume that the bike and the truck arrive at the first combined node \(\kappa _1\) at the same time. Subsequently, we add the summed costs (of the corresponding vehicle) between the depot and the first combined node \(\kappa _1\) instead of the maximum of both vehicles. This results in the following definition:

$$\begin{aligned} \tilde{c}^t(\mathcal {T}):= \sum _{i=1}^{m-1} \max \left\{ c^t(\kappa _i, \kappa _{i+1}, \mathcal {T}), c^b(\kappa _i, \kappa _{i+1}, \mathcal {B}) \right\} + c^t(v_t, \kappa _1, \mathcal {T}) + c^t(\kappa _m, v_t, \mathcal {T}), \end{aligned}$$
(3)

and similarly, for

$$\begin{aligned} \tilde{c}^b(\mathcal {B}):= \sum _{i=1}^{m-1} \max \left\{ c^t(\kappa _i, \kappa _{i+1}, \mathcal {T}), c^b(\kappa _i, \kappa _{i+1}, \mathcal {B}) \right\} + c^b(v_b, \kappa _1, \mathcal {B}) + c^b(\kappa _m, v_b, \mathcal {B}). \end{aligned}$$
(4)

Note that if \(v_t \ne v_b\), the first node the cargo bike visits after the depot has to be a combined node. In this case, \(c^b(v_b, \kappa _1, \mathcal {B}) = c^b(v_b, b_1) = c^b(v_b, \kappa _1)\) applies.

Obviously, \(\hat{c}^t(\mathcal {T}) \ge \tilde{c}^t(\mathcal {T}) \ge c^{t}(\mathcal {T})\) and \(\hat{c}^b(\mathcal {B}) \ge \tilde{c}^b(\mathcal {B}) \ge c^{b}(\mathcal {B})\) holds.

3.2 MIP formulations

In this section, we present our main MIP formulation for the combined truck-and-cargo-bike routing problem with one truck and one bike. This is based on synchronized (time-based) costs while the objective is to minimize the completion time (tbc_mct). Furthermore, we describe variations of the latter, taking into account different cost structures, objective functions and other relevant properties. Note that the MIP formulations only consider solutions \((\mathcal {T}, \mathcal {B})\) with \(\mathcal {T} \ne \emptyset\) and \(\mathcal {B}\ne \emptyset\). Therefore, an optimal solution of the MIP formulation has to be compared to an optimal solution using only the truck or the bike, respectively, which can be computed as a TSP.

Starting with the main formulation (tbc_mct), we define binary variables \(x_{(v, w)}^t\), \(x_{(v, w)}^b\), \(x_{v}^t\) and \(x_{v}^b\), which indicate, respectively, whether the corresponding edge \((v,w) \in E\) or node \(v \in V\) is on the truck or bike tour.

The variables \(d_v\) represent the costs of the respective tour starting at \(v_t\) or \(v_b\) up to node \(v \in V\), taking into account that both vehicles have to wait for each other at the combined nodes. More intuitively, \(d_v\) represents the time when the respective vehicle reaches node v or the time of the slower vehicle in case v is visited by both the truck and cargo bike. Regardless of the cost function, we need these variables to ensure that both vehicles visit the combined nodes in the same order.

Last, we define variables \(\ell _v\) for \(v \in V\) to take the limited bike capacity \(C_b\) into account. The value of \(\ell _v\) describes the number of goods delivered from the bike up to and including node v, starting after the last reloading at a combined node. The resulting MIP is shown in (5)–(19).

Constraints (10)–(11) ensure that each truck and bike node has exactly one incoming and outgoing edge on the corresponding tour. With constraints (7), both tours start at their respective depot, and with (9), each node is supplied by at least one of the two vehicles. If \(v_t \ne v_b,\) constraints (12) ensure that the first node visited by the bike is a truck node; otherwise, (12) is not necessary.

$$\begin{aligned} \min\;\; z \end{aligned}$$
(5)
$$\begin{aligned} \mathrm{s.t.}\ {} \;\; d_{v_t},d_{v_b} \le z \end{aligned}$$
(6)
$$\begin{aligned}&x_{v_t}^t, x_{v_b}^b = 1 \end{aligned}$$
(7)
$$\begin{aligned}&x_{v_t}^b ,x_{v_b}^t = 0 \end{aligned}$$
(8)
$$\begin{aligned}&1 \le x_{v}^t + x_{v}^b \quad \forall \; v \in V \end{aligned}$$
(9)
$$\begin{aligned}&\sum _{\begin{array}{c} w \in V \\ w \ne v \end{array}} x_{(v, w)}^t = \sum _{\begin{array}{c} w \in V \\ w \ne v \end{array}} x_{(w,v)}^t= x_{v}^t \quad \forall \; v \in V \end{aligned}$$
(10)
$$\begin{aligned}&\sum _{\begin{array}{c} w \in V \\ w \ne v \end{array}} x_{(v, w)}^b = \sum _{\begin{array}{c} w \in V \\ w \ne v \end{array}} x_{(w,v)}^b = x_{v}^b \quad \forall \; v \in V \end{aligned}$$
(11)
$$\begin{aligned}&x_{(v_b, v)}^b&\le x_{v}^t \quad \begin{array}{c} \forall \; v \in V \\ v \ne v_b \end{array} \end{aligned}$$
(12)
$$\begin{aligned}&\ell _v \le (1-x_{v}^t) \cdot C_b \quad \forall \; v \in V \end{aligned}$$
(13)
$$\begin{aligned}&\ell _v + d(w) - \ell _w \le (1-x_{(v, w)}^b + x_{w}^t) \cdot ( C_b + \max _{v' \in V} \{d(v') \} ) \quad \forall \; (v,w) \in E \end{aligned}$$
(14)
$$\begin{aligned}&x_{(v_t, v)}^t \cdot c^t(v_t, v) \le d_{v} \quad \forall \; v \in V \end{aligned}$$
(15)
$$\begin{aligned}&x_{(v_b, v)}^b \cdot c^b(v_b, v) \le d_{v} \quad \forall \; v \in V \end{aligned}$$
(16)
$$\begin{aligned}&c^t(v, w) \cdot x_{(v, w)}^t + c^b(v, w) \cdot x_{(v, w)}^b \le d_{w} - d_{v} + (1-x_{(v, w)}^t-x_{(v, w)}^b) \cdot M \quad \begin{array}{c} \forall \; (v,w) \in E \\ v \ne v_t,v_b \end{array} \end{aligned}$$
(17)
$$\begin{aligned}&x_{e}^t, x_{e}^b, x_{v}^t, x_{v}^b \in \{0,1\} \quad \forall \; e \in E, v \in V \end{aligned}$$
(18)
$$\begin{aligned}&d_v, \ell _v \ge 0 \quad \forall \; v \in V \end{aligned}$$
(19)

Lemma 3

Constraints (13)–(14) ensure that the cargo bike visits a truck node for reloading (if necessary) without exceeding the capacity.

Proof

The value of \(\ell _v\) describes the number of goods delivered from the bike up to and including node v, starting after the last reloading at a combined node. Consequently, this value may not exceed \(C_b\) and is set to 0 at each truck node, especially at all combined nodes (i.e., all combined nodes are served by the truck). Constraints (13) ensure the latter. If v is a truck node, it follows that \(x_{v}^t = 1\), and therefore, \(0 \le \ell _v \le 0\). Otherwise, we have \(x_{v}^t = 0\) and \(\ell _v \le C_b\). If the bike drives from v to w (i.e., \(x_{(v, w)}^b = 1\)) and w is not a combined node (in particular, not a truck node, i.e., \(x_{w}^t = 0\)), it follows from constraints (14) that \(\ell _v + d(w) \le \ell _w.\) For any other case, we get \((1-x_{(v, w)}^b + x_{w}^t) \ge 1\), and since

$$\begin{aligned} \ell _v \le C_b{} & {} \text { and }{} & {} d(w) \le \max _{v \in V} \{d(v) \}, \end{aligned}$$

holds, we get

$$\begin{aligned} \underbrace{\ell _v}_{\le C_b} + \underbrace{d(w)}_{\le \max _{v \in V} \{d(v) \}} \le \ell _w + \underbrace{(1-x_{(v, w)}^b + x_{w}^t)}_{\ge 1} \cdot ( C_b + \max _{v \in V} \{d(v) \} ) \end{aligned}$$

and obtain \(0 \le \ell _w\). Consequently, there is no relevant bound on \(\ell _w\). \(\square\)

Lemma 4

The MIP formulation in (5)–(19) ensures that both vehicles meet at the same time at a combined node and takes the resulting waiting times into account.

Proof

The proof can be found in Appendix A. \(\square\)

Note that in this formulation, both vehicles start at the same time. Consequently, objective (5) minimizes the completion time due to constraints (6).

Lemma 5

Constraints (17) serve as subtour elimination constraints for the truck and bike tour, respectively.

Proof

The proof can be found in Appendix A. \(\square\)

For all formulations, including synchronization between the vehicles, we need to choose M in constraints (17) large enough. In particular, M has to satisfy

$$\begin{aligned} d_{v} - d_{w} + \max _{(v,w) \in E} \{c^t(v, w), c^b(v, w) \} \le \max _{v \in V} \{d_{v} \} + \max _{(v,w) \in E} \{c^t(v, w), c^b(v, w) \}, \end{aligned}$$

if the corresponding \(x_{(v, w)}^t\) or \(x_{(v, w)}^b\) is equal to zero.

While \(\max _{(v,w) \in E} \{c^t(v, w), c^b(v, w)\}\) is easy to determine a priori, this is not readily possible for \(\max _{v \in V} \{ d_{v} \}\). Consequently, we have to estimate the latter by, for example, \(\sum _{e \in E } \max \{ c^t(e), c^b(e) \}\).

3.2.1 Time-based costs minimizing longest tour (tbc_mlt)

To assume the bike and the truck arrive at the first combined node \(\kappa _1\) always at the same time, our MIP needs a slight modification. By adding continuous variables \(g^t\) and \(g^b\), we indicate the gap between the departure time of the first and second vehicle. Subsequently, we have \(g^t > 0, g^b = 0\) or \(g^t = 0, g^b > 0\). In addition, we replace constraints (15), (16) and (6) by

$$\begin{aligned}&d_{v_t} - g^t \le z \end{aligned}$$
(20)
$$\begin{aligned}&d_{v_b} - g^b \le z \end{aligned}$$
(21)
$$\begin{aligned}&x_{(v_t, v)}^t \cdot c^t(v_t, v) + g^t \le d_{v} \quad \forall \; v \in V \end{aligned}$$
(22)
$$\begin{aligned}&x_{(v_b, v)}^b \cdot c^b(v_b, v) + g^b \le d_{v} \quad \forall \; v \in V \end{aligned}$$
(23)
$$\begin{aligned}&g^t, g^b \ge 0. \end{aligned}$$
(24)

Under this assumption, it follows that \(d_{v_t} - g^t\) equals the truck tour costs in (3) and \(d_{v_b} - g^b\) equals the bike tour costs in (4). Consequently, the objective function (5) minimizes the maximum of both tour costs due to constraints (20) and (21).

3.2.2 Time-based costs minimizing summed tour durations (tbc_mst)

To minimize the summed tour durations with vehicles starting at the same time, we can modify our MIP by removing constraints (6) and replacing the objective function (5) by

$$\begin{aligned} d_{v_t} + d_{v_b}. \end{aligned}$$
(25)

Remark

If vehicles do not have to start at the same time, we instead use the previous formulation (tbc_mlt), replace the objective (5) by

$$\begin{aligned} d_{v_t} + d_{v_b} - g^b- g^t, \end{aligned}$$
(26)

and remove constraints (20) and (21).

3.2.3 Distance-based costs with synchronization (dbc_ws)

The calculation of the tour costs becomes much easier if \(c^t(v,w)\) and \(c^b(v,w)\) represent distance-based costs between node v and node w. As mentioned at the beginning of this section, we can easily sum up the costs of all edges used since there is no need to consider waiting times. Nevertheless, we cannot neglect the variables \(d_v\) and the corresponding constraints to ensure that both vehicles visit the combined nodes in the same order. Subsequently, we only remove (6) and replace the objective (5) in (5)–(19) by

$$\begin{aligned} \sum _{(v,w) \in E} \left( x^t_{(v,w)} \cdot c^t(v,w) + x^b_{(v,w)} \cdot c^b(v,w)\right) . \end{aligned}$$
(27)

3.2.4 Distance-based costs without synchronization (dbc_os)

By assuming that the truck can safely deposit the goods at the combined nodes until the bike arrives, the bike and truck do not need to be at a combined node at the same time to reload. Therefore, we can remove variables \(d_v\) and the associated constraints in the above formulation.

As a consequence, we have to add extra constraints to eliminate subtours in the bike and the truck tour. Those from the Miller–Trucker–Zemlin formulation in Miller et al. (1960) are suitable, as they are also compatible if we allow multiple bikes.

3.2.5 MIP model with multiple bikes

We can extend the previous MIP formulations by slightly modifying constraints (11), similar to the vehicle routing MIP formulation to model using multiple bikes. Instead of one, we can set or limit the in- and outgoing edges from the bike depot to any constant number \(B^*\) of allowed cargo bikes. Constraints (28)–(29) represent the latter and replace constraints (11).

$$\begin{aligned}&\sum _{\begin{array}{c} w \in V \\ w \ne v \end{array}} x_{(v, w)}^b = \sum _{\begin{array}{c} w \in V \\ w \ne v \end{array}} x_{(w,v)}^b = x_{v}^b \quad \begin{array}{c} \forall \; v \in V \\ v \ne v_b \end{array} \end{aligned}$$
(28)
$$\begin{aligned}&\sum _{\begin{array}{c} w \in V \\ w \ne v_b \end{array}} x_{(v_b, w)}^b = \sum _{\begin{array}{c} w \in V \\ w \ne v_b \end{array}} x_{(w,v_b)}^b \le B^* \end{aligned}$$
(29)

Note that by using more than one bike, \(d_{v_b}\) denotes the time when the last bike returns to the depot node. Therefore, we can only minimize the completion time or the distance-based versions since we cannot distinguish between different cargo bikes. Consequently, it is not possible to consider multiple bikes with different capacity restrictions in this model. To overcome this, we would have to use separate variables for each cargo bike.

3.2.6 Remarks and further variations

The different objective functions with the corresponding generalized costs are summarized in Table 2. As a reminder, \(c^b(\mathcal {B})\) (\(c^t(\mathcal {T})\)) correspond to the bike (truck) costs without waiting times, \(\tilde{c}^b(\mathcal {B})\) (\(\tilde{c}^t(\mathcal {T})\)) are the bike (truck) costs when both start independently, but wait for each other at the combined nodes and \(\hat{c}^b(\mathcal {B})\) (\(\hat{c}^t(\mathcal {T})\)) are the bike (truck) costs when both wait for each other at the combined nodes and start at the same time.

Table 2 Objectives and corresponding generalized costs

Although minimizing carbon emissions may seem reasonable, we have not included this objective function in Table 2. This is due to the fact that this leads to a minimization of the truck tour duration, which in most cases ends in a very long and expensive bike tour.

Further variations of the model, especially reformulations of models from Anderluh et al. (2017) and Murray and Chu (2015), can be found in Appendix B.3.

Moreover, in Appendix C, we discuss the relationship to the traveling salesperson problem as well as the capacitated vehicle routing problem and provide results on approximating the improvement of optimal solutions compared to TSP solutions.

4 Solution approaches

As CTBRP is NP-hard, we cannot expect to find good solutions by using standard MIP solvers for realistically sized instances, as can be seen in the experiments in Sect. 5. Therefore, we introduce different heuristic approaches here.

4.1 Clustering-based heuristics

We start with a simple heuristic by clustering the customers first and subsequently calculating combined tours in and between the clusters.

An easy first idea is to use a typical clustering algorithm, e.g., k-means clustering, based on the cost function to create a clustering of the customers. After that, we calculate an optimal combined tour in each cluster by ensuring that the number of customers in each cluster is sufficiently small. Finally, we link the local solutions through a truck and bike tour.

We will not give a more detailed description of this approach because of its obvious bottleneck: Good local solutions can lead to a bad global solution. The following example illustrates the latter.

Example 6

Consider the instance on the left side of Fig. 2. For x large enough, two clusters A and B result. Both have the same structure, as shown on the right side in Fig. 2.

Fig. 2
figure 2

Depot and two clusters (left) and the structure of a cluster on the right side with truck (red) and bike costs (blue) (color figure online)

In an optimal solution, the truck supplies the customers in cluster A and the bike those in cluster B (if \(C_b = 3\), \(v_D = v_t = v_b, 0< \epsilon < 1\), and the demand is equal to 1 for all customers). However, since the local optimal solutions in the clusters use both the bike (\(\mathcal {B}_A = (A_1, A_2, A_1), \mathcal {B}_B = (B_1, B_2, B_1)\)) and the truck (\(\mathcal {T}_A = (A_1, A_3, A_1), \mathcal {T}_B = (B_1, B_3, B_1)\)), both clusters are served by both vehicles in the resulting global solution as well. For \(\epsilon \longrightarrow 0\), the costs of this solution tend to infinity, but those of an optimal solution remain the same.

Thus, we propose an alternative clustering approach. After creating the clustering, we first calculate the combined tour between the clusters and, subsequently, the corresponding solutions in the clusters. To keep the size and number of clusters small enough, applying both previous steps recursively would be possible. We call this algorithm Heuristic-Clustering and give a more detailed description in Algorithm 1.

Algorithm 1
figure a

Heuristic-Clustering

Note that a dummy node (corresponding to a cluster center) can only be served by the bike if the corresponding summed demands of the nodes contained in that cluster do not exceed the bike capacity. Depending on the clustering, certain clusters are thus assigned to the truck a priori. To compute the shortest Hamiltonian path between two combined nodes, we use simulated annealing.

4.2 TSP-based heuristic

Motivated by the results in Sect. C.2, we consider an algorithm that starts with the same tour for both vehicles (containing all nodes, e.g., an optimal TSP tour for the truck) and then successively deletes nodes in both tours to improve the solution. This is similar to the savings algorithm by Clarke and Wright (1964) but in a reversed fashion. Instead of merging tours until we obtain a feasible solution, we split both tours up while maintaining feasibility.

Algorithm 2
figure b

Heuristic-TSP

Algorithm 3
figure c

CalculateSavings

This algorithm, called Heuristic-TSP, is described in Algorithm 2, and the procedure how to select the nodes that are removed is described in Algorithm 3. The idea of the latter is to calculate the possible savings we get if we remove node \(v_i\), which has to be a combined node since, in the other case, removing would not be feasible. If we delete this node in a tour, the corresponding vehicle can skip this node, and as a consequence, there is no need to wait at \(v_i\) for the respective other vehicle. The resulting savings are denoted in line 13 of Algorithm 3. If we consider a distance-based formulation instead of (tbc_mct), the resulting savings for truck node \(t_i\) can be calculated as follows:

$$\begin{aligned} s(t_i):= c^t(t_{i-1},t_{i}) + c^t(t_{i},t_{i+1}) - c^t(t_{i-1},t_{i+1}), \end{aligned}$$
(30)

and for bike node \(b_i\)

$$\begin{aligned} s(b_i):= c^b(b_{i-1},b_{i}) + c^b(b_{i},b_{i+1}) - c^b(b_{i-1},b_{i+1}). \end{aligned}$$
(31)

For the experimental evaluation in Sect. 5, we use simulated annealing to calculate the TSP tour for the truck and only consider the time-based variant.

4.3 Heuristic based on reinforcement learning

In this section, we take a slightly different approach using reinforcement learning. The idea of reinforcement learning is that one or more so-called agents interact with a dynamic environment and get instant feedback about their actions and the resulting changes to the environment. It belongs to the field of artificial intelligence and is used, among other things, in combinatorial optimization, e.g., for solving the TSP (Gambardella and Dorigo 1995; Júnior et al. 2010; Alipour and Razavi 2015; Zhang et al. 2020), which motivates its application in our model.

We adopt the reinforcement learning method Q-learning for our setting that is based on a Markov decision process (Buşoniu et al. 2010). In each state s, the agent has information about the environment and possible actions with the corresponding reward. Here, the Q-function \(Q: S \times A \rightarrow \mathbb {R}\) approximates the expected return we get if we take the action \(a \in A\) in state \(s \in S\) and then follow an optimal policy. The latter means following the sequence of actions that leads to the maximal cumulative return. It follows that if the Q-function is optimal, we achieve an optimal policy by choosing in every state s an action \(a'\) that maximizes \(Q(s,a')\).

To learn the Q-function, we start with an arbitrary one, and when we transit from state s to \(s'\) by taking action a and observing the reward r, we update Q(sa) as follows:

$$\begin{aligned} Q(s,a) \longleftarrow Q(s,a) + \alpha \left( r + \gamma \max _{a' \in A} Q(s',a') - Q(s,a) \right) . \end{aligned}$$

Here, \(\alpha\) denotes the learning rate while the following term describes the difference between the current estimate Q(sa) and updated estimate \(r + \gamma \max _{a' \in A} Q(s',a')\) of the Q-value. The second hyperparameter is the discount factor \(\gamma\) that allows modeling uncertainty about future rewards.

In our setting, the environment consists of the complete graph with nodes, edges, demands, positions and costs for both vehicles. We have an agent for the truck and one for the bike, respectively, with corresponding Q-functions, which we initialize with the negative costs of the respective cost function. The state s denotes the node that the corresponding vehicle is currently visiting (and the current load of the cargo bike), while the state space S includes all visitable nodes. In action space A, we have all nodes that can be served by the corresponding vehicle as the next node without violating any feasibility constraints. To exclude nodes in A that are already visited or to keep the bike from exceeding its capacity, we mask the corresponding actions, i.e., we temporarily set their Q-values to \(- \infty\). Consequently, the next visited node corresponds to action a. The reward of the action a is equal to the negative time (costs) the current combined tour would take extra.

To have a balance between exploration and exploitation, we use the epsilon-greedy approach, i.e., in each step, we select a random next node with probability \(\epsilon\) and otherwise, the action with the highest Q-value. Afterward, \(\epsilon\) is updated by multiplying with \(0<\epsilon '<1\). After arriving at the next state, we update the Q-function, the visited nodes, the positions of the vehicles and the current load of the bike.

To select the vehicle for the next step, we initialize the probability prob of choosing the truck as follows:

$$\begin{aligned} prob:= \dfrac{\sum _{e \in E} c^t(e)}{\sum _{e \in E} c^t(e) + \sum _{e \in E} c^b(e)}. \end{aligned}$$

Then, in each step, this value is updated in the following way

$$\begin{aligned} prob \longleftarrow 0.7 \cdot prob + 0.3 \dfrac{ c^t(\mathcal {T})}{ c^t(\mathcal {T}) + c^b(\mathcal {B})}, \end{aligned}$$

where \(c^t(\mathcal {T})\) and \(c^b(\mathcal {B})\) are the costs of the current truck and bike tour, respectively.

A training episode is finished after all nodes are visited and both vehicles are back at their corresponding depot. Our training consists of 1000 episodes, and we choose \(\epsilon =1\), \(\epsilon ' = 0.999\), \(\gamma = 0.95\) and \(\alpha = 0.8\).

5 Experimental evaluation

We evaluate six problem variants introduced in this paper on three classes of instances. The first class of artificial instances, \(\mathcal {I}_1(n, \delta )\), see Fig. 3, is used to evaluate which parameters influence how difficult the problems are to solve. We especially evaluate the runtime and gap to an optimal solution for various settings of the number of customers n and speedup \(\delta\). More precisely, for a solution with value SOL and an optimal solution with value OPT, the gap refers to \(\frac{SOL-OPT}{SOL}\).

The second and third classes of realistic instances, \(\mathcal {I}_W(n, C_b)\) and \(\mathcal {I}_M(n, C_b)\), see Fig. 7, consist of up to 250 addresses in Wuppertal and Münster, Germany, respectively. Here, we especially consider the solution quality compared to the TSP optimum.

Fig. 3
figure 3

Instance \(\mathcal {I}_1(8, \delta )\) with corresponding edge costs of the truck, for the bike costs of edge \(e \in E\) we have \(c^b(e):= \delta \cdot c^t(e)\)

The formulations with synchronized (time-based) costs are (tbc_mct), (tbc_mlt), (tbc_mst) with objective (25) and (tbc_mct2), which is the same formulation as (tbc_mct) but with up to two bikes as described in Sect. 3.2.5. Those with independent distance-based costs are (dbc_ws) and (dbc_os). For an overview of the objective functions, see Table 2. To solve the MIP formulations, we use Gurobi 8.1.1 (Gurobi 2019) and a time limit of 60 min for instances \(\mathcal {I}_1(n, \delta )\) and 180 min for instances \(\mathcal {I}_W(n, C_b)\) and \(\mathcal {I}_M(n, C_b)\), respectively.

5.1 What makes the problems hard to solve?

Our first instance class \(\mathcal {I}_1(n, \delta )\) consists of the example introduced in Lemma 11 in Appendix C.2, where we use six nodes, i.e., \(n=5\) customers. We extend this recursively by three nodes (i.e., \(n \in \{5,8,11,14,17,20,23,26,29,32,35 \}\)) while we maintain the structure of the instance and vary the parameter \(\delta \in [ 0.05,1]\) in steps of 0.05. Recall that \(\delta\) corresponds to the lowest speedup of the truck compared to the bike. In Fig. 3, instance \(\mathcal {I}_1(8,\delta )\) is shown.

As described in Lemma 11, the structure of this instance class is theoretically perfectly suited to (tbc_mct) and (tbc_mlt). Additionally, we know that for both formulations, the optimal solution value for \(\mathcal {I}_1(n, \delta )\) is \(\frac{2}{3} \delta (1+ n)\).

Fig. 4
figure 4

Instance \(\mathcal {I}_1(n, \delta )\): Mean values of runtimes and gaps depending on the value of \(\delta\) and number of customers n

Fig. 5
figure 5

Instance \(\mathcal {I}_1(n, \delta )\): Runtimes depending on the value of \(\delta\) and averaged over all n

Solving MIP formulations We first consider the influence of the parameters n and \(\delta\) on the runtime of solving the MIP formulations, see Fig. 4 as well as Tables 3 and 4. Note that the runtime of the distance-based models (dbc_ws) and (dbc_os) is considerably lower than of the time-based models (tbc_mct) and (tbc_mlt). On average, the distance-based models can be solved about 5 times faster than the time-based ones (within a time limit of 60 minutes). This can be explained by the synchronization constraints (described in Lemma 4) and the fact that in the time-based models, we want to minimize tour durations, including waiting times, which are neglected in the distance-based formulations (especially in (dbc_os)). Tables 3 and 4 show that for the time-based models, the time limit of one hour leads to suboptimal solutions for \(n\ge 11\). However, all distance-based models can be solved to optimality up to \(n=26\) and \(n=17\), respectively, within the time limit. Therefore, we consider the runtime of the distance-based models in Fig. 4a. As expected, increasing the number of demand points n leads to increased runtimes. However, also the speedup factor \(\delta\) has a large influence on the runtime where instances with \(\delta \in [0.15,0.35]\) take considerably more time to solve. A similar correlation can be observed for the time-based models (tbc_mct) and (tbc_mlt) in Fig. 4b although the influence of the speedup \(\delta\) is less pronounced. Note that for \(n\ge 20\), the problem could not be solved to optimality for any \(\delta\), such that we report the gap of the best solution found within the time limit to the theoretically optimal solution \(\delta \left( 4 + \frac{2}{3} (n-5) \right) = \frac{3}{2}\delta (1+n)\), which can be derived from the example in Lemma 11. Note that this gap is considerably tighter than the MIP gap reported by Gurobi: For average n, the MIP gap ranged between 68 and 79%, and the gap to an optimal solution to (tbc_mct) and (tbc_mlt) varied from 25% to 55% depending on \(\delta\).

Table 3 Results for algorithms (dbc_os), (dbc_ws) and (tbc_mst)
Table 4 Results for algorithms (tbc_mlt), (tbc_mct) and (tbc_mct2)

To further investigate the influence of the speedup factor \(\delta\) on the runtime, we analyze the runtime of the models for varying values of \(\delta\) averaged over all considered n in Fig. 5. As the runtime of the distance-based models is considerably shorter, we depict these separately in Fig. 5a. Here, we observe that adding synchronization constraints in (dbc_ws) significantly increases the runtime and that \(\delta =0.15\) results in the most difficult problems. When further increasing \(\delta\), the runtime reduces significantly.

For the time-based models, Fig. 5b shows a different correlation. For increasing \(\delta\), the average runtime of the time-based models increases. Only for \(\delta =1\), i.e., when truck and bike have the same speed, the runtime reduces again, possibly due to Lemma 14, which shows that for \(\delta = 1\) the problem reduces to a TSP. This shows that adding the synchronized drive time and the waiting time into the objective structurally changes the problem and the solution process.

More detailed results can be found in Tables 3 and 4 in Appendix D.1.

Fig. 6
figure 6

Instance \(\mathcal {I}_1(n, \delta )\): Gap values (in percent) of the algorithms depending on the value of \(\delta\) for time-based model (tbc_mct). Each boxplot represents the variation of the gap depending on n. Here, the black line represents the median, and the box represents the 25th to 75th percentile of the gap values

Heuristics In Fig. 6, we consider the influence of \(\delta\) and n on the solution quality of for time-based model (tbc_mct) for Heuristic-Clustering, Heuristic-TSP and Q-learning. While for Heuristic-TSP and Q-learning, the influence of n seems to be marginal, the solution quality improves with increasing \(\delta\) although the influence is rather small for \(\delta >0.4\). Similar behavior can be observed for Heuristic-Clustering, but here, the solution quality varies a lot with changing n. This is probably due to the low time limit of 60 s for solving the MIP, which is part of the solution process.

Averaged over all instances \(\mathcal {I}_1(n,\delta )\), we observe a gap of 36, 44, 54 and 49% for the MIP, Heuristic-TSP, Q-learning and Heuristic-Clustering, respectively. If we exclude the five smallest instance sets, i.e., we only consider instances with \(n \ge 20\); this leads to a gap of 59, 46, 60 and 70% for the previously mentioned solution methods. Thus, the heuristic solutions are competitive to MIP solutions for sufficiently large instances. This is especially relevant as overall, the solution times are considerably lower for the heuristics with \(<11\) seconds for Heuristic-TSP, \(<60\) seconds for Q-learning and \(<80\) seconds for Heuristic-Clustering) compared to the runtimes of the MIP solver. While the runtime increases with increasing n, it appears to be to be independent of \(\delta\).

More detailed results can be found in Table 8 in Appendix D.1.

5.2 Improvements from TSP

The second and third instance classes \(\mathcal {I}_W(n, C_b)\) and \(\mathcal {I}_M(n, C_b)\) consist of \(n \in \{10, 20, 50, 100, 250 \}\) addresses in Wuppertal (Fig. 7a) and Münster (Fig. 7b), respectively, with the corresponding distances and durations of the bike and truck. For each instance size, all associated locations are included in the next larger instance. We use the data provided by OpenStreetMap (Boeing 2017; OpenStreetMap contributors 2017) and typical speeds on the corresponding road types and vehicles. The bike depot is located in the inner city (red markers in Fig. 7), and the depot of the truck is in the northernmost and southernmost part of the map, respectively (blue markers in Fig. 7). All customers have a uniform demand equal to 1, while the capacity of the bike is equal to 50. Even though this is a common size in real-world applications, it is not reasonable to use such a high value for instances with less than 100 nodes. Therefore, we downsize the capacity for these instances, i.e., \(C_b = 0.2 n\). Additionally, we analyze the impact of varying the impact of the bike capacity.

Fig. 7
figure 7

\(\mathcal {I}_W(n, C_b)\) and \(\mathcal {I}_M(n, C_b)\). The red triangles correspond to \(n=10\) and the additional markers to the next larger instances: purple squares (\(n=20\)), orange pentagons (\(n=50\)), blue triangles (\(n=100\)) and green circles (\(n=250\)). The big red marker in the inner city represents the bike depot, and the blue one outside the city represents the truck depot (color figure online)

With the selection of these instances, we pursue three main goals. The first is to compare cities with two different structures: While Münster is known to be very bike-friendly, Wuppertal is the opposite. In Table 5, the different values of \(\delta\) and average ratios of the bike and truck weights underline the latter. The second goal is to investigate the influence of the bike capacity and compare the optimal solutions for the different objective functions. Finally, we study the behavior of our three algorithms with increasing instance size and compare it to the optimal TSP objective value. Note that in the TSP solution, the truck visits each node, including the bike and truck depot.

Comparing the solution structure between the models First, we consider the structure of solutions for the different time- and distance-based models for both Wuppertal \((\mathcal {I}_W(20,4))\) and Münster \((\mathcal {I}_M(20,4))\). Note that as in the previous case, the distance-based models are solved within a few seconds to optimality, while we obtain significantly higher runtimes for the distance-based models. Here, the MIP gap of (tbc_mct2) ranged between 17 and 36%, while all remaining formulations are solved to optimality.

Fig. 8
figure 8

Truck, bike and tour durations of different models with \(C_b = 4\) for Wuppertal and Münster (\(n=20\))

Fig. 9
figure 9

Truck tour duration depending on \(C_b\) and model, \(100\%\) corresponds to the duration of the truck TSP tour (\(n=20\))

Figure 8 shows the amount of time driving and waiting for both the truck and the bike, normalized by the duration of a TSP truck tour. As expected, there are considerable differences between the distance and time-based models, as well as between Wuppertal and Münster.

For the distance-based models, there are not sufficiently many shortcuts for the bike, but by construction, we have to use both vehicles, such that in both cases, the combined tour length is longer. The synchronization in (dbc_ws) additionally results in a high waiting time, such that the distance-based models provide no advantage compared to the TSP tour regarding the completion time.

On the contrary, all time-based models reduce the duration of the truck tour significantly, and the respective completion time, as well as the longest tour duration, is lower than the duration of the TSP. Additionally, we observe that there is very little waiting time in the time-based models except for the model minimizing the completion time with two bikes, (tbc_mct2). Here, the sum of the driving times for both bikes is depicted such that the total driving time of the bike significantly exceeds that of one of the other models.

Note that for all models, both the duration of the truck tour and the completion time of the combined tour are lower for Münster than for Wuppertal. This can be attributed to the fact that for Münster, the mean speed difference between the bike and the truck is lower than for Wuppertal, see Table 5.

Influence of the bike capacity From an environmental perspective, it is most important to reduce the truck tour. In Fig. 9, we consider the influence of the bike capacity on reducing the truck tour duration for each model. For Wuppertal (\(\mathcal {I}_W(20,C_b)\)), Fig. 9a shows that depending on the model, the truck duration can be reduced by up to 46% where the time-based models have a significantly larger impact than the distance-based ones. However, increasing the bike capacity to more than 4, i.e., 20% of the truck capacity, has hardly any effect on the duration of the truck tour. For Münster (\(\mathcal {I}_M(20,C_b)\)), on the other hand, the reduction of the truck tour duration is more pronounced, as up to 51% of reduction are reached, see Fig. 9b. Here, the influence of increasing the bike capacity on reducing the truck tour duration is slightly larger, especially for the time-based model (tbc_mst) minimizing the summed tour duration.

Table 5 Average ratios of bike and truck weights and values of \(\delta\) of the different instances \(\mathcal {I}_W(n, C_b)\) and \(\mathcal {I}_M(n, C_b)\)
Table 6 Runtimes of the different heuristic algorithms for solving (tbc_mct) on instances \(\mathcal {I}_W(n, 0.2n)\) and \(\mathcal {I}_M(n, 0.2n)\)

Heuristic solutions for larger instances For both instances \(\mathcal {I}_W\) and \(\mathcal {I}_M\), the time limit of three hours allowed for solving instances of up to 20 demand points to optimality using the commercial MIP solver (Gurobi 2019). However, larger instances could not be solved optimally within this time frame, such that we consider the heuristics Heuristic-Clustering, Heuristic-TSP and Q-learning for solving (tbc_mct) as introduced in Sect. 4. Table 6 shows that even for \(n=250\), the runtimes of Heuristic-TSP and Heuristic-Clustering are below two minutes, while the runtime of Q-learning increases to up to 18 min.

The difference between the two cities, Wuppertal and Münster, is also reflected in the solution quality of the heuristic solutions. Figure 10 depicts the objective value of (tbc_mct), i.e., the completion time of the heuristics solutions for a varying number of demand points. Here, the objective is normalized by the duration of a TSP truck tour for the same instance to facilitate comparing the solutions. First, note that for Wuppertal, see Fig. 10a, almost no solution improves upon the TSP tour. Only for 10 and 20 demand points, Heuristic-TSP and the MIP formulation find solutions that have a shorter completion time than the duration of the TSP tour. When comparing the heuristics to one another, it is apparent that Heuristic-Clustering performs worst while for larger instances with \(n \ge 50\), Q-learning performs best. For small instances with \(n \le 20\), Heuristic-TSP performs best and is comparable to the optimal solution computed by a MIP solver as the gap is 9 and 20% for \(n=10\) and \(n=20\), respectively.

A similar behavior can be observed for Münster, see Fig. 10b, where Q-learning also finds the best solutions for large instances with \(n \ge 100\) while Heuristic-Clustering performs worst. The most important difference, however, is that for all instances, the best heuristic solution outperforms the TSP solution. Even for \(n=250\), the completion time could be reduced to 75% of the TSP tour duration. Thus, the modeling approach developed in this paper helps to reduce the completion time of the delivery process and, consequently, the duration of truck tours.

Fig. 10
figure 10

Completion time, i.e., objective value of (tbc_mct) in percent depending on number of demand points n and heuristic. 100% corresponds to the duration of the truck TSP tour of the corresponding instance size. Note that \(C_b=0.2n\)

All results can be found in Tables 9, 10 and 11 in Appendix D.2.

As we can see from the different results, Heuristic-Clustering has the advantage of being effective on small instance sizes, as it can often solve them exactly. However, due to the time limit and the exact solving process, which can become time-consuming, it performs worse on larger instances. Concerning the different graph structures, it is more suitable on instances with evenly distributed customers, as customers are closer together in the same cluster, allowing for more efficient bike routing. Also, Heuristic-TSP performs better on smaller instances since their optimal solutions of CTBRP share more similarities with the optimal TSP solutions, while on larger instance sizes, optimal solutions of CTBRP may deviate significantly from the TSP solution structurally. Q-learning consistently performs well across all instance sizes, contingent on the bike covering a larger number of customers without meeting the truck after each stop. This allows Q-learning to generate practical subtours for instance classes \(\mathcal {I}_W(n, C_b)\) and \(\mathcal {I}_M(n, C_b)\), as opposed to Heuristic-TSP, which is more suitable for \(\mathcal {I}_1(n, \delta )\) where we have alternating bike and truck nodes.

6 Conclusion and future research

In this paper, we present a new concept for last-mile deliveries using two already established means of transport: the delivery truck and the cargo bike. We model this problem such that the truck both delivers goods and serves as a moving mini-depot for the cargo bike that has to be resupplied during the delivery route. By using the truck to restock the cargo bike, we do not need to construct mini-depots throughout the city. This makes our approach more flexible against changing demands and independent of available construction sites while also reducing investment costs. We develop a MIP formulation in various versions, either focusing on minimizing delivery time or distance, and extended this to include multiple cargo bikes. Moreover, we provide upper bounds compared to the truck-only delivery and theoretically analyze the complexity of the problem. To connect our approach to other delivery concepts from the literature, we show how our models can be adapted to cover these concepts as well.

In an experimental proof of concept, we analyze the performance of our different MIP formulations and show the advantages of our approach compared to truck-only delivery. These promising results show that we can outperform the traditional TSP approach in terms of completion time while reducing the distance driven by the truck and motivating further research in this field.

As a consequence of the limitations of MIP formulations in solving larger instances, we develop three heuristic solution approaches that can provide comparable solutions in a short amount of time. Due to the versatility of our model, covering a broad class of problems, those approaches might be useful to solve further problems in the literature, or conversely, we might adapt established methods. On the one hand, it would be interesting to consider the delivery problem as a multi-criteria problem where tour duration and distance covered are minimized simultaneously. On the other hand, uncertainties in the drive times have to be taken into account for making the model viable in practice. Here, it would be especially interesting to consider the case where truck and cargo bike are not affected in the same way by delays, e.g., when the bike can use a separate bike path and is not affected by traffic jams. Another interesting variant of the delivery problem is to integrate public transport planning, e.g., by using an already existing public transport line and fixing the corresponding nodes.

As part of future work, we plan to adapt deep Q-learning to learn the Q-values using a neural network and refine the reward function. Moreover, to reduce the complexity by one level, we plan to use a two-step approach where the assignment of stops to the truck and the cargo bike is fixed first and routes are constructed later.