Introduction

In our modern society, deep labor-division and fast-paced lifestyles have made most people hard to find time to cook for themselves. Therefore, more and more people resort to takeaway (take-out) food, which can be selected online and brought by deliverymen to their home or offices. This requirement has boosted the food takeaway market in the last years. According to data from the Statista company [36], in the UK, 2019, the total food service delivery market value was around 8.5 billion British pounds, 55% of which belonged to online orders. In China, from 2015 to 2019, the total amount of takeaway orders increased from 134.8 billion RMB yuan to 603.5 billion, the penetration (i.e., the ratio of the total takeaway order amount to the national catering revenue) increased from 4.2 to 14.2% (Fig. 1), and the number of online takeaway customers reached 421 million in 2019, accounting for 49.3% of the total netizens (Fig. 2, data from Trustdata [38]). Nowadays, takeaway has been one of the most popular and fastest-growing service industries in the country.

The takeaway industry has provided numerous jobs, particularly deliveryman jobs, for the society. In popular online takeaway ordering and delivery platforms, such as Meituan Waimai and Baidu’s Eleme, the typical workflow can be described as follows (as illustrated in Fig. 3):

  1. 1.

    Customers place orders online;

  2. 2.

    Takeaway stores receive the orders, determine which orders they accept, and post the accepted order information online (visible to deliverymen);

  3. 3.

    Deliverymen explore the candidate orders, among which select those they want to deliver;

  4. 4.

    Deliverymen go to the stores, and if orders are ready, pick up the orders and deliver them to the corresponding customers.

With the rapid growth of the takeaway industry, the workload of deliverymen increases dramatically. For example, in 2019, the average daily number of takeaway orders sent per deliveryman was around 40, and the average delivery time per order was around 30 min. Therefore, to improve the working efficiency and the corresponding revenue, every deliveryman wants to optimize his order selection and delivery path routing decision. However, the number of candidate orders is often large, different orders have different delivery fees, and their pickup points (i.e., takeaway stores) and service points (i.e., customer locations) are distributed in different locations. Therefore, it is difficult to optimize order selection and delivery path planning for a deliveryman to maximize his revenue.

Fig. 1
figure 1

The developments of takeaway industry in China from 2015 to 2019

Fig. 2
figure 2

The scale and utilization rate of online takeaway users in China from 2015 to 2019

Fig. 3
figure 3

A typical takeaway order processing workflow

This paper is a substantial extension of a conference paper [40]. The conference paper proposed a hybrid optimization algorithm for a basic problem of takeaway order selection and delivery path planning. The basic problem assumed that the ready time of each order is exactly known in advance; however, in practice, the actual order ready time often deviates widely from the expected order ready time, which can badly affect the work proficiency of deliverymen. Moreover, the basic problem did not consider customers’ scoring on orders, which, in real-world takeaway delivery platforms, is an important factor influencing the revenue of deliverymen. In addition, the basic problem did not limit the delivery path length, which may cause infeasible solutions. In this paper, we add the limit of maximum delivery path length to the problem, and use a machine-learning approach to estimate both the order ready time and customer satisfaction level based on the historical habit data of takeaway stores and customers, so as to evaluate delivery solutions in a more accurate way. For the extended problem, we adapt the hybrid optimization algorithm to optimize the revenue of the deliveryman in a more effective manner, and conduct more extensive experiments to validate the performance of the proposed method. The main contributions of this paper can be summarized as follows:

  • We present a problem of takeaway order selection and delivery path planning for deliverymen, which utilizes store and customer habit data to better estimate order ready time and customer satisfaction level;

  • We propose a hybrid evolutionary algorithm to efficiently solve the problem;

  • We demonstrate the performance of the proposed method on test instances constructed based on real-world data of online takeaway platforms.

In the rest of the paper, we first introduce related work in the literature, and then present the problem of takeaway order selection and delivery path planning for deliverymen; next, we propose the machine-learning approach for estimating the problem parameters and the hybrid evolutionary algorithm for solving the problem, and then validate the proposed method in computational experiment; finally, we conclude with a discussion.

Related work

With the rapid growth of electronic commerce, vehicle routing problems (VRPs) for scheduling vehicles to deliver goods to a given number of service points (customers) have been extensively studied in the literature [13]. Planning deliverymen’ paths passing through a set of pickup points and drop-off points subject to delivery time limits can be categorized as a special class of VRPs, known as the pickup and delivery problem with time windows (PDPTW) [12]. For these NP-hard problems, traditional mathematical programming methods are only effective on small-size problem instances; therefore, metaheuristic and evolutionary algorithms, such as genetic algorithms (GAs) and particle swarm optimization (PSO), are widely used to find near-optimal solutions for medium- and large-size problem instances within an acceptable solution time [1, 4, 6].

In addition to common features of VRP and PDPTW, takeaway delivery path planning has some special features. First, as meal is perishable and customers are often waiting anxiously, takeaway orders are typically expected to be delivered within a short time (an hour or even much less) and within minutes of the food becoming ready. Hsu et al. [17] presented a VRP with time windows (VRPTW) for delivering perishable food from a distribution center, the objective of which considers not only the costs for dispatching vehicles, but also those of transportation, inventory, energy and penalty costs for violating time windows. Huang et al. [18] applied an ant colony optimization (ACO) algorithm to plan the delivery route to minimize the total time for takeaway distributions. Reyes et al. [32] formalized a meal delivery routing problem to model the essential structure of dynamic delivery systems, and developed an algorithm based on rolling-horizon repeated matching to solve courier assignment and capacity management. Gao and Jiang [14] applied a firework algorithm [54] to optimize takeaway delivery paths in the condition of safety. Yu and Luo [43] studied an online PDPTW with single pickup point for routing a deliveryman with a constant capacity to serve requests released over time so as to minimize the total latency. They proved the lower bound of this problem for various capacities of the deliveryman, and presented online wait-and-return and wait-and-ignore online algorithms for a half line case. Yildiz and Savelsbergh [42] presented a meal delivery routing problem that assumes perfect information about order arrivals; they proposed a simultaneous column- and row-generation method to solve the problem. Liao et al. [23] presented a green meal delivery routing problem with the objectives to simultaneously maximize customer satisfaction and rider balance utilization and minimize carbon footprint; they proposed an algorithm based on nondominated Sorting GA [11] and adaptive large neighborhood search for the problem. Liu and Liu [26] presented an integrated production and distribution problem with a single machine, multiple customers, and homogeneous vehicle; they solved this problem using an improved large neighborhood search algorithm. Shan et al. [29] proposed a deep reinforcement learning approach combined with Dijkstra’s algorithm for food delivery route planning, which can provide accurate navigation when road network information is unknown. Ulmer et al. [39] studied a stochastic PDPTW for delivering food from a set of restaurants to ordering customers. They presented an anticipatory customer assignment policy, which is able to improve service significantly for all stakeholders. To solve an integrated problem of production-inventory-routing of perishable goods with transshipment and uncertain demand, Liu et al. [28] presented an algorithm that begins with an initial solution and then iteratively improves it using two local search strategies including inserting the best and removing the worst solutions. In [30], Liu considered on-demand meal delivery service using drones, and proposed a progressive algorithm for drone dispatch and order delivery in a dynamic, real-time operational environment. When addressing a stochastic online route-planning problem, Zheng et al. [48] proposed an end-to-end deep-learning model for finding optimal routes in milliseconds by learning policy from training data.

Second, orders are not available for pickup at the beginning of the planning period, which was considered by Liu et al. [25] in the capacitated VRP with order available time and solved using a tabu search algorithm. In [27], the authors proposed a hybrid harmony search and tabu search algorithm for the problem. Li et al. [22] studied a similar VRP with order release time, where a vehicle often needs multiple trips due to the relatively short delivery distance. They proposed an adaptive large neighborhood search algorithm combined with a labeling procedure for the problem.

Third, one customer may order food multiple times from a store or from multiple stores. Consolidating orders of the same customer can reduce the delivery times and distance. However, multiple deliveries to the same customer cannot be completely removed. Zhang et al. [46] presented an integer programming model of order consolidation aiming to reduce the number of trips, while achieving a tradeoff between splitting and consolidating orders; they proposed a three-phase heuristic algorithm to solve the problem, and demonstrated the superiority of the order consolidation approach over the first-in-first-out approach. To solve a time-critical third-party logistics problem with order consolidation and transshipment point selection, Salhi et al. [33] proposed an effective metaheuristic based on the greedy randomized adaptive search procedure. Soman and Patil [35] studied a heterogeneous VRP with release and due dates in the presence of order consolidation and warehousing capacity limits; they proposed a scatter search method with strategic oscillation, which is able to solve large-size instances. Ji et al. [20] proposed a method for grouping food delivery tasks to improve food delivery efficiency, using heuristics consisting of a greedy algorithm and a replacement algorithm.

Most existing studies either integrate order assignment and delivery path routing, or assume that the orders have been assigned and hence focus on path routing. Nevertheless, in takeaway delivery systems, deliverymen are not simply passive entities; instead, they create their own “organic algorithms” to manage, and in some cases, even subvert the system [37]. For example, in popular food delivery platforms such as Baidu Deliveries, Eleme, and Meituan, deliverymen pay close attention to “grab orders” to improve their revenue. However, studies on deliverymen’ proactive strategies for takeaway order selection are relatively few. İç et al. [19] studied an order selection problem for a bakery firm, for which they used a fuzzy TOPSIS method to obtain the order ranking incorporated in the knapsack problem to determine the lot size and which orders to select. Ma et al. [31] considered a combined order selection and VRP for perishable product delivery, for which they proposed a hybrid ACO and local search method. Zhang and Liu [44] formulated a takeaway distribution problem as a bi-objective, mixed integer programming model; they proposed a two-stage solution strategy based on human–computer interaction to solve the problem. Nevertheless, to the best of our knowledge, there is no study on methods integrated order selection and path planning for takeaway deliverymen, the revenue of which not only consists of the basic delivery fee of each order overdue penalty, but also is subject to reward/penalty for large/small number of orders, and high customer scoring reward.

Problem formulation

Basic inputs

The consider problem aims to make an optimal decision of order selection and delivery path planning for a deliveryman. There are a set O of n candidate orders. For each order \(o\in O\), the pickup point (store) is denoted by \(p_o\), the expected order ready time is denoted by \(r_o\), and the corresponding service point (customer) is denoted by \(s_o\). For convenience, we use \(p_0\) to denote the initial location of the deliveryman, and let \(P=\big (\cup _{o\in O}p_o\big )\bigcup \big (\cup _{o\in O}s_o\big )\bigcup \{p_0\}\) be the set of all pickup points, service points, and the initial location of the deliveryman. The travel time between each pair of points i and j is denoted by \(\varDelta t(i,j)\) (\(\forall i,j\in P\)). The vehicle (typically, electronic bicycle) of the deliveryman has a maximum distance; here, we transform the maximum distance to the maximum travel time \(\widehat{T}\), which neglects acceleration and deceleration in the path for simplicity.

If an order o is selected by the deliveryman, the basic delivery fee is \(v_o\), and it is required to deliver the order to \(p_o\) before the delivery deadline \(\widehat{t}_o\) to earn the delivery fee. However, if the actual delivery time is later than \(\widehat{t}_o\), an overdue penalty will be posed. In this study, we consider a three-level overdue penalization rule that is employed by most food delivery platforms in China as follows:

  • If the overdue time is shorter than 15 min, the basic delivery fee will be deducted by a percent \(e_1\);

  • If the overdue time is between [15, 30] min, the basic delivery fee will be deducted by a percent \(e_2\);

  • If the overdue time is longer than 30 min, no delivery fee will be paid, and an additional penalty fee which is a percent \(e_3\) of the basic delivery fee will be deducted.

The delivery platform also encourages deliverymen to take more orders: if the number of orders completed by a deliveryman in a given period (e.g., per week) exceeds a threshold, an additional reward will be granted; on the contrary, if the number is below a lower limit, his base salary will be deducted. To reflect this effect on the deliveryman’s revenue per unit time, in this problem, we set a lower limit \(\underline{n}_a\) and two reward thresholds \(n^\dag _a\) and \(n^\ddag _a\) (\(\underline{n}_a\!<\!n^\dag _a\!<\!n^\ddag _a\)) on the number \(n_p\) of orders per hour completed by the deliveryman, and use the following rule according to the reward/penalization levels in popular platforms and number conversion based on average working hours:

  • If \(n_p\) is less than the lower limit \(\underline{n}_a\), there is an additional penalty of \(\epsilon _1(\underline{n}_a-n_p)\) yuan;

  • If \(n_p\) is between \([n^\dag _a,n^\ddag _a)\), there is an additional reward of \(\epsilon _2\) yuan per order;

  • If \(n_p\) reaches or exceeds \(n^\ddag _a\), there is an additional reward of \(\epsilon _3\) yuan per order.

Moreover, when the order is completed, if the customer gives a five-star (highest) score on the order delivery, the deliveryman will receive an award fee of e.

Note 1

The possibility of negative scoring and the corresponding penalty are not considered in the revenue, to avoid that orders from low-scoring customers would not be selected by any deliverymen.

Note 2

The above rules and parameters can be adjusted and tailored to different delivery platforms, which will not have side effect on our formulation and solution method.

Table 1 The inputs of the problem

Uncertain factors

In particular, in this problem, we consider two uncertain factors. The first is that, at the beginning of the planning period, the expected ready time \(r_o\) of each order is estimated and given by the store, but the estimation is not always accurate. In many cases, the actual ready time, denoted by \(\widehat{r}(o)\), is later, which will postpone all subsequent orders, and therefore, has a significant side effect on the delivery time. We employ a data-driven, machine-learning approach described in the next section to estimate \(\widehat{r}(o)\) based on \(r_o\) and the historical habit data of the store.

The second is about customer satisfaction level. Although the satisfaction level generally depends on the delivery time [7], some customers are more likely to give five-star scores, while others are not. We also employ a machine-learning approach described in the next section to estimate the probability \(\rho (o)\) that the customer of the order will give a five-star score on the delivery (under the condition that the delivery time is not overdue) based on the delivery time and the historical habit data of the customer.

Table 1 lists the above input variables of the problem.

Decision variables

The deliveryman needs to select a subset \(O_{\mathbf {x}}\) of orders from the candidate order set O, and then determine a path to deliver the selected orders. Therefore, the decision variables of the problem can be represented by the following three parts:

  • An n-dimensional vector \(\mathbf {x}=\{x_1,x_2,\dots ,x_n\}\), where \(x_k=1\) denotes that \(o_k\) is selected and \(x_k= 0\) otherwise (\(1\le k\le n\)); then the subset of selected orders is \(O_{{\mathbf {x}}}=\{o_k|o_k\in O \wedge x_k=1\}\).

  • A sequence \(\mathbf {y}=\{y_1,y_2,\dots ,y_{l(\mathbf {y})}\}\) of the set of all pickup points and service points of the orders in \(O_{{\mathbf {x}}}\), where \(l(\mathbf {y})\) denotes the length of \(\mathbf {y}\). In other words, \(\mathbf {y}\) represents the delivery path of the deliveryman. For each \(y_j\) in \(\mathbf {y}\), if \(y_j\) is a pickup point, we let \(O(y_j)= \{o|o\in O_{\mathbf {x}}\wedge p_o=y_j\}\) be the set of orders from store \(y_j\), and suppose that the orders in \(O(y_j)\) are sorted in increasing order of ready time; if \(y_j\) is a service point, we let \(O'(y_j)= \{o|o\in O_{\mathbf {x}}\wedge s_o=y_j\}\) be the set of orders for customer \(y_j\).

  • For each pickup point \(y_j\) in \(\mathbf {y}\), an integer \(z(y_j)\) that denotes the deliveryman’s decision on how many orders the deliveryman will pick up from \(y_j\) at this time. That is, if \(|O(y_j)|=1\), then \(z(y_j)\) is 1; else, \(z(y_j)\) is an integer in \([1,|O(y_j)|]\), indicating that the deliveryman will pick up the first \(z(y_j)\) of these \(|O(y_j)|\) orders and then leave (and will go back for the remaining orders if exist).

Note 3

As the deliveryman may visit a store or a customer more than once if the store or customer is related to multiple orders, the length of permutation \(\mathbf {y}\) is variable. Anyway, the length \(l(\mathbf {y})\) is at most \(2|O_{{\mathbf {x}}}|\). For simplicity, we never place the initial location 0 of the deliveryman in the permutation.

Calculation of delivery time and revenue

The actual delivery time of each order depends on the order ready time, pickup time, and delivery path \(\mathbf {y}\). Obviously, the first point \(y_1\) in \(\mathbf {y}\) must be a pickup point, and the time at which the deliveryman arrives at \(y_1\) is

$$\begin{aligned} t(y_1)= \varDelta t(p_0,y_1). \end{aligned}$$
(1)

At the first pickup point \(y_1\), the deliveryman’s decision is to pick up the first \(z(y_1)\) orders in \(O(y_1)\) and then leaves \(y_1\). Let O[z] denotes the z-th element in O; the time at which the deliveryman leaves \(y_1\) is

$$\begin{aligned} t'(y_1)= \max \big (t(y_1), \widehat{r}(O(y_1)[z(y_1)])\big ) \end{aligned}$$
(2)

Afterward, we remove the first \(z(y_1)\) orders from \(O(y_1)\):

$$\begin{aligned} O(y_1)= O(y_1)\backslash \{O(y_1)[1..z(y_1)]\} \end{aligned}$$
(3)

The times at which the deliveryman arrives at and leaves each subsequent point in \(y_j\) can be iteratively calculated as follows \((2\!\le \!j\!\le \!l(\mathbf {y}))\):

$$\begin{aligned} t(y_j)= & {} t'(y_{j-1}) + \varDelta t(y_{j-1}, y_j) \end{aligned}$$
(4)
$$\begin{aligned} t'(y_j)= & {} {\left\{ \begin{array}{ll} t(y_j),&{} y_j \text { is a service point}\\ \max \big (t(y_j), \widehat{r}(O(y_j)[z(y_j)])\big ),&{} y_j \text { is a pickup point} \end{array}\right. } \end{aligned}$$
(5)

When leaving each pickup point \(y_j\), we remove the first \(z(y_j)\) orders from \(O(y_j)\):

$$\begin{aligned} O(y_j)= O(y_j)\backslash \{O(y_j)[1..z(y_j)]\} \end{aligned}$$
(6)

When arriving each service point \(y_j\), for each order \(o\in O'(y_j)\), if the order has been picked up before \(y_j\), then its delivery time d(o) is determined:

$$\begin{aligned}&d(o)= t(y_j), \forall o\in O'(y_j)\wedge (\exists j'\!<\!j: o\text { is among the first }\nonumber \\&z(y_{j'})\text { orders in }O(y_{j'})) \end{aligned}$$
(7)

Therefore, we can calculate the revenue of each order \(o\in O_{{\mathbf {x}}}\) as follows:

$$\begin{aligned} f(o) = {\left\{ \begin{array}{ll} v_o+\rho (o)e, &{} d(o)< \widehat{t}_o\\ (1-e_1)v_o, &{} \widehat{t}_o \le d(o) \le \widehat{t}_o + 15\\ (1-e_2)v_o, &{} \widehat{t}_o\!+\!15 < d(o) \le \widehat{t}_o\!+\!30\\ -e_3v_o, &{} {d(o)>\widehat{t}_o+30} \end{array}\right. } \end{aligned}$$
(8)

Here, we specify time in minutes, and hence the number of orders per hour is

$$\begin{aligned} n_p = 60|O_\mathbf {x}|/t(y_{l(\mathbf {x})}) \end{aligned}$$
(9)

And the additional penalty/reward of a solution is calculated as follows:

$$\begin{aligned} g(\mathbf {x},\mathbf {y},\mathbf {z}) = {\left\{ \begin{array}{ll} -\epsilon _1(\underline{n}_a-n_p), &{} n_p< \underline{n}_a\\ \epsilon _2|O_\mathbf {x}|, &{} n^\dag _a\le n_p <n^\ddag _a\\ \epsilon _3|O_\mathbf {x}|, &{} n_p\ge n^\ddag _a\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(10)

The problem objective is to maximize the revenue per unit time, i.e., the ratio of the total revenue to the completion time \(t(y_{l(\mathbf {x})})\):

$$\begin{aligned} \max F(\mathbf {x},\mathbf {y},\mathbf {z}) = \frac{\big (\sum _{o\in O{{\mathbf {x}}}}f(o)\big )+g(\mathbf {x},\mathbf {y},\mathbf {z})}{t(y_{l(\mathbf {x})})+\epsilon } \end{aligned}$$
(11)

where \(\epsilon \) is a very small number to avoid division by zero (i.e., if the deliveryman does not select any order, the objective function value should be zero).

Constraints

We specify the following constraints for the problem:

  • For each selected order o, the delivery path must contain its pickup point \(p_o\) and service point \(s_o\), and the (first) occurrence of \(p_o\) should be before that of \(s_o\):

    $$\begin{aligned} p_o\in \mathbf {y}\wedge s_o\in \mathbf {y} \wedge \textit{ind}(p_o,\mathbf {y})\!<\! \textit{ind}(s_o,\mathbf {y}), \quad \forall o\in O_{{\mathbf {x}}}\nonumber \\ \end{aligned}$$
    (12)

    where \(\textit{ind}(i, \mathbf {y})\) denotes the index of element i in sequence \(\mathbf {y}\) (if the element occurs multiple time, it returns the first index).

  • The decision \(z(y_j)\) at each pickup point is not larger than the cardinality of \(O(y_j)\):

    $$\begin{aligned} 1\le z(y_j)\le |O(y_j)|, \quad \forall \text { pickup point }y_j\text { in }\mathbf {y} \end{aligned}$$
    (13)
  • The total travel time cannot exceed the maximum travel time \(\widehat{T}\):

    $$\begin{aligned} \varDelta t(p_0,y_1)+\sum _{j=1}^{l(\mathbf {y})\!-\!1} \varDelta t(y_j,y_{j+1})\le \widehat{T} \end{aligned}$$
    (14)

Data-driven machine learning for estimating order ready time and customer satisfaction level

As aforementioned, for the considered problem, we identify two uncertain factors, i.e., the actual order ready time and customer satisfaction level, which are regarded as main challenges in order selection and delivery routing [9]. We employ a data-driven, machine-learning approach to provide more accurate predictions to address these challenges based on historical habit data of stores and customers, i.e., the overdue records of stores and five-star scoring records of customers.

To predict the actual order ready time, we consider the following influence factors of the corresponding store:

  • Expected ready time \(r_o\) of the current order given by the store;

  • Number of orders whose ready times are overdue in the recent month;

  • Percentage of orders whose ready times are overdue in the recent month;

  • Maximum, minimum, median, and standard deviations of the overdue time of the overdue orders in the recent month;

  • Number of orders whose ready times are overdue in the recent 3 days;

  • Percentage of orders whose ready times are overdue in the recent 3 days;

  • Maximum, minimum, median, and standard deviations of the overdue time of the overdue orders in the recent 3 days;

  • Number of orders that are accepted by the store and to be delivered in the next hour.

We construct a three-layer, feed-forward artificial neural network (ANN) to calculate the actual order ready time based on the above 14 inputs. The training data is limited to the recent 1 month, as the habit of a takeaway store often changes.

To predict the probability of five-star scoring on the order, we consider the following influence factors of the corresponding customer:

  • Calculated delivery time d(o) of the current order;

  • Number of orders placed by the customer in the recent 3 months, recent week, and recent day;

  • Percentage of orders receiving five-star scores to all orders that are not overdue in the recent 3 months, recent week, and recent day;

  • Number of orders received or to be received by the customer 1 h before and after.

Similarly, we construct an ANN to calculate the probability based on the above eight inputs. The training data are limited to the recent 3 months.

Fig. 4
figure 4

The flowchart of the hybrid evolutionary algorithm

A hybrid evolutionary algorithm for the problem

Due to the complex combinatorial nature of the considered problem, we propose a hybrid evolutionary algorithm, which consists of a main procedure for optimizing the solution to the main order selection problem and a subprocedure for optimizing path planning for each main solution. The flowchart of the algorithm can be described by the following steps (as illustrated in Fig. 4):

  1. (1)

    Randomly initialize a population of order selection solutions;

  2. (2)

    For each order selection solution \(\mathbf {x}\) in the population do:

    1. (2.1)

      Use a greedy method to generate an initial path \(\mathbf {y}\) together with pickup decisions \(\mathbf {z}\);

    2. (2.2)

      Use the subprocedure to iteratively improve the path and decisions;

    3. (2.3)

      Evaluate the fitness of \(\mathbf {x}\) based on the best path and decisions found so far;

  3. (3)

    Use the main procedure to evolve the order selection solutions;

  4. (4)

    Repeat steps (2) and (3) until the stopping condition is satisfied.

For the subprocedure for optimizing the delivery path and the pickup decisions, we propose a heuristic method based on tabu search [15, 16], which is much faster than those population-based evolutionary algorithms, as the subprocedure will be invoked as many times as the evaluations of main solutions. For the main procedure for optimizing solutions to the main order selection problem, we have tested a set of popular evolutionary algorithms, and found that the WWO metaheuristic [50] exhibits performance advantages over other popular evolutionary algorithms on the test instances. We describe the tabu search subprocedure (including the greedy initialization method) and the main procedure in details in the following two subsections, respectively.

Tabu search for path planning

Given a main order selection solution \(\mathbf {x}\), we first use the greedy method to produce an initial subsolution \((\mathbf {y},\mathbf {z})\) of path with pickup decisions as follows:

  1. (1)

    Initialize an empty sequence for \(\mathbf {y}\) and an empty set \(\varOmega \) of picked up orders;

  2. (2)

    Choose the pickup point y closest to the deliveryman and add it to the sequence, and calculate the arrival time t(y);

  3. (3)

    Set the pickup decision z(y) as follows:

    1. (3.1)

      If \(|O(y)|=1\), i.e., O(y) has only one order denoted by \(o_y\), then set \(z(y)=1\);

    2. (3.2)

      Else, find the last order \(o^\dag _y\) whose ready time is not later than t(y), let \(j^\dag _y\) be the index of \(o^\dag _y\) in O(y), and set z(y) to a random integer in \([j^\dag _y,|O(y)|]\);

    3. (3.3)

      Remove the first z(y) orders from O(y) to \(\varOmega \), and set \(t'(y)=\max \big (t(y),\) \(\widehat{r}(O(y)[z(y)])\big )\); if O(y) becomes empty, remove y from the candidate pickup points;

  4. (4)

    From all candidate pickup points and those service points that are related to at least one order in \(\varOmega \), choose the point y closest to the deliveryman and add it to the sequence, calculate the arrival time t(y), and

    1. (4.1)

      If y is a pickup point, go to Step (3);

    2. (4.2)

      If y is a service point, for all orders \(o\in \varOmega \) and \(s_o=y\), set \(d(o)=t(y)\), remove these orders from \(O'(y)\), and set \(t'(y)=t(y)\); if \(O'(y)\) becomes empty, remove y from the candidate service points;

  5. (5)

    Repeat Step (4) until \(\varOmega =O_{\mathbf {x}}\).

From the initial \((\mathbf {y},\mathbf {z})\), we use tabu search that iteratively searches the neighborhood of the subsolution and goes to the best neighboring subsolution that is better than the current one or is not tabued. As the subsolution consists of two parts, the path and pickup decisions, we consider two types of neighborhood search. The first type conducts point swapping operations on the path \(\mathbf {y}\). Considering the problem constraints, we design the following four swapping operations:

  1. (a)

    Swap two adjacent pickup points \(p_1\) and \(p_2\), as illustrated by Fig. 5a.

  2. (b)

    Swap a service point s and a subsequent pickup point p; in particular, if s has occurred again after p as a service point of the order from p but not as a service point of that from any other pickup point after p, then the next occurrence of s will be removed, as illustrated by Fig. 5b.

  3. (c)

    Swap a pickup point p and a subsequent service point s, if s is a service point of the order from another pickup point before p; however, if s is also a service point of the order from p, s will be reinserted after p, as illustrated by Fig. 5c.

  4. (d)

    Swap two service points \(s_1\) and \(s_2\), where there is no any pickup points between them, as illustrated by Fig. 5d.

In addition, if the swapping operations involves a pickup point, Step (3) of the greedy initialization method is employed to reset the pickup decision on the point, and each service point related to a decreased decision is moved to a random position after the corresponding order is picked up.

The second type of neighborhood search modifies pickup decisions \(\mathbf {z}\) by randomly choosing a pickup point y satisfying \(|O(y)|\!>\!j^\dag _y\) and changing z(y) to another random value in \([j^\dag _y,|O(y)|]\); this can be divided into two case:

  1. (a)

    z(y) is increased; in this case, if \(z(y)=|O(y)|\), the later occurrence(s) of y in \(\mathbf {y}\) will be removed.

  2. (b)

    z(y) is decreased; in this case, for each later occurrence of \(y'\) in \(\mathbf {y}\), Step 3) of the greedy initialization method is employed to reset the corresponding pickup decision, and each service point related to a decreased decision is moved to a random position after the corresponding order is picked up.

Fig. 5
figure 5

Illustration of the four specific swapping operations used in tabu search

Algorithm 1 presents the pseudo-code of tabu search, where \(t_{\max }\) is the maximum number of iterations, \(n_b\) is the neighborhood size (i.e., the number of neighbors generated at each generation), \(\textit{tlen}\) is the maximum tabu length, and \(\textit{rnd}()\) produces a random number in [0,1].

figure a

Water wave optimization for order selection

For order selection optimization, we propose an evolutionary algorithm based on the WWO metaheuristic [50] that takes inspiration from shallow water wave models for solving optimization problems. In particular, WWO has demonstrated superior performance on a variety of selection problems that have same or similar structure of solution space [3, 8, 24, 41, 49, 53]. In WWO, each solution is analogous to a wave and is assigned with a wavelength inversely proportional to the solution fitness. The key principle of WWO is that the higher (lower) the solution fitness, the smaller (larger) the wavelength, and the smaller (larger) range the solution explores (as illustrated in Fig. 6), which results in a good balance of global search and local search.

Fig. 6
figure 6

Wave propagation in WWO [50]

WWO starts by initializing a population of solutions, which are then evolved by three operators named propagation, refraction, and breaking. As the original WWO is proposed for continuous optimization, here we need to adapt the algorithm to evolve solutions in the discrete search space [52]. First, we adapt the propagation to perform a number of local search steps on each solution \(\mathbf {x}\), where each local search step changes a random dimension \(x_k\) from 0 to 1 or from 1 to 0. The maximum number of local search steps is controlled by the wavelength \(\lambda (\mathbf {x})\), which is an integer calculated as

$$\begin{aligned} \lambda (\mathbf {x})= \lceil n^{(f(\mathbf {x})-f_{\min })+\epsilon )/(f_{\max }-f_{\min }+\epsilon )} \rceil \end{aligned}$$
(15)

where \(\lceil \cdot \rceil \) denotes rounding to the nearest integer, and \(f_{\max }\) and \(f_{\min }\) are the maximum and minimum fitness among the population, respectively.

After propagation, if the new solution is better than the original one, it will replace the original one in the population.

Second, we adapt the breaking operator on each newly found best solution \(\mathbf {x}^*\) by generating \(n^*\) one-step neighboring solutions around \(\mathbf {x}^*\). Here we introduce an adaptive method for controlling the number \(n^*\) of neighboring solutions as follows:

$$\begin{aligned} n^*= \textit{rnd\_int}\left( 1, \widehat{n}\frac{f_{\text {old}}^*+\epsilon }{f(\mathbf {x}^*)+\epsilon }\right) \end{aligned}$$
(16)

where \(\widehat{n}\) is a control parameter, and \(f^*_{\text {old}}\) is the objective function value of the old best solution. In this way, the more improvement of the new best over the old one, the larger number of neighboring solutions exploited.

Following the work of simplified WWO [55], we replace the refraction operator with a population reduction strategy in order to accelerate convergence. The strategy iteratively decreases the population size \(\textit{NP}\) from an upper limit \(\textit{NP}_{\max }\) to a lower limit \(\textit{NP}_{\min }\) as follows:

$$\begin{aligned} \textit{NP}_g= \textit{NP}_{\max }-(\textit{NP}_{\max }-\textit{NP}_{\min })\frac{g}{g_{\max }} \end{aligned}$$
(17)

where g is the current number of generations (or function evaluations), and \(g_{\max }\) is the maximum allowable number of the generations (or function evaluations). Whenever the size is decreased by one, the worst solution in the population is removed.

Algorithm 2 presents the pseudo-code of the WWO algorithm with adaptive breaking (denoted by WWO-AB) for the main problem of order selection.

figure b

Computational experiments

Experimental results of machine learning

We train the ANNs to predict the two uncertain factors for each order based on historical data of two popular food delivery applications. For predicting order ready time, we use a dataset of 1330 samples related to 121 takeaway stores. For predicting five-star scoring probability, we use a dataset of 1750 samples related to 204 customers. We use a fivefold cross-validation, that is, we partition each dataset into five equal-size pieces and run validation five times, each using four pieces as the training set and the remaining piece as the test set.

We also employ WWO to tune the ANN parameters [56], and compare the performance of ANN with linear regression and logistic regression. Figure 7 presents the root mean squared errors (RMSE) of the three models as well as the RMSE of the order ready time estimated by the store. The results show that the average deviation of the order ready time estimated by the store to the actual order ready time is about 15.37 min, which will not only delay the delivery of current order, but also have a knock-on effect on all remaining orders. The three machine-learning models utilize historical data to predict the order ready time. However, the error of linear regression model is only slightly lower than that of the manual estimation. The logistic regression model is more accurate than the linear one, but its average deviation is still more than 11 min. Compared to the two regression models, the ANN achieves a significant lower error of 7.88 min, which can effectively reduce the side effect on the delivery plan.

For five-star scoring probability prediction, we use two metrics. The first is the success rate, i.e., the percentage of successful predictions, where a probability larger than 0.5 for a five-star scoring or a probability smaller than 0.5 for a non-five-star scoring is considered successful. The second is the deviation of the sum of probabilities to the actual number of five-star scoring. Figure 8 presents the results, where the orange line denotes the actual number (646) of five-star scoring of the three models as well as the RMSE of the order ready times estimated by the stores. The results show that ANN achieves the highest success rate, while the success rates of the two regression models are not much lower. However, the differences among the deviations of the number of five-star scoring obtained by the three models are relatively big. The linear regression model overestimates 126 five-star scoring, and the logistic regression model underestimates 73; in comparison, ANN only underestimates 24, and such a small deviation will make the calculation of the revenue of the deliveryman (i.e., objective function of the problem) much more accurate.

Fig. 7
figure 7

RMSE of order ready times predicted by the three machine-learning models and the stores

Fig. 8
figure 8

Success rate and deviation of the number of five-star scoring of the three machine-learning models

Experimental results of evolutionary optimization

To test the performance of solving the takeaway order selection and delivery path planning problem, we construct a test set of 11 instances, which are generated based on historical data of two popular food delivery applications. Table 2 describes numbers of orders and points of each instance, which represent the size/difficulty of the instance. Some other important parameters of the instances are set as \(e_1\!=\!0.3\), \(e_2\!=\!0.5\), \(e_3\!=\!0.7\), \(e\!=\!1\), \(\epsilon _1\!=\!0.5\), \(\epsilon _2\!=\!0.5\), \(\epsilon _3\!=\!1\), \(\underline{n}_a\!=\!5\), \(n^\dag _a\!=\!12\), and \(n^\ddag _a\!=\!16\).

Table 2 Numbers of orders and points of the problem instances

To validate the performance of proposed WWO-AB algorithm, we compare it with the following eight popular metaheuristic evolutionary algorithms for subset selection optimization:

  • GA [10];

  • ACO [21];

  • PSO [5];

  • Differential evolution (DE) [2];

  • Biogeography-based optimization (BBO) [34, 47];

  • Ecogeography-based optimization (EBO) [51];

  • Artificial algae algorithm (AAA) [45];

  • Basic WWO, where the number of neighboring solutions generated by a breaking operation is a random value between 1 and a fixed threshold [41].

Table 3 Comparative results on the test instances

WWO-AB and the eight comparative algorithms invoke the same tabu search procedure given in Algorithm 1 for path planning optimization for each main solution. The control parameters of tabu search are set as \(\textit{len}=7\), \(n_b=10\), and \(t_{\max }=10\) for instances #1 and #2, 20 for #3–#5, and 30 for #6–#11. The control parameters of the nine evolutionary algorithms are first set as suggested in the literature and then tuned on the whole test set. For WWO-AB, the control parameters are set as \(\textit{NP}_{\max }=30\), \(\textit{NP}_{\min }=6\), and \(\widehat{n}=12\). The computational environment is a computer with Intel core i7-8700 3.20 GHz CPU, and 16 GB DDR4 memory. For a fair comparison, all algorithms use the same stopping condition that the number of fitness evaluations reaches the maximum allowable number, which is set to 4000 for instances #1–#3, 8000 for #4–#6, 12000 for #7 and #11, and 16000 for #9–#11. In this setting, the CPU time consumed to solve the largest-size instance #11 is less than one second, which makes it appropriate to employ the algorithms to work out solutions for deliverymen selection and path planning in practice.

On each test instance, each algorithm is run 30 times, and the performance is evaluated based on the results over the 30 runs. Table 3 presents the median (med) and standard deviation (std) of the objective function values obtained by the algorithms on each test instance. For each instance, the best median value among the nine algorithms is shown in bold. A superscript \(^\dag \) indicates that there is a statistically significant difference (at 95% confidence level). We conduct Wilcoxon rank sum tests to compare the result of WWO with that of each other algorithm, and use a superscript \(^\dag \) before the median value of the corresponding algorithm to indicate that there is a statistically significant difference (at 95% confidence level). Moreover, we present the median, maximum, minimum, first quartile (25%) and third quartile (75%) of the objective function values obtained by each algorithm among 30 runs on each instance in the box plots in Fig. 9.

Fig. 9
figure 9

Box plots of the objective function values of obtained by the nine algorithms on the test instances. Max: maximum; Min: minimum, Q1: the first quartile (25%); Q3: the third quartile (75%)

Fig. 10
figure 10

Revenues obtained by the method with machine learning and the method without

Among the 11 test instances, WWO-AB obtains the best median value on 10 instances except instance #5. On the smallest-size instances #1, WWO-AB, BBO, EBO, and AAA obtain the same best median value; on instance #2, WWO-AB, BBO and EBO obtain the same best median value; on the instance #5, BBO obtains the best median value, while WWO-AB obtains the second best; on each of the remaining eight instances, WWO-AB uniquely obtains the best median value. According to the statistical test results, the results of WWO-AB are significantly better than GA, ACO, PSO, DE, and the basic WWO on all 11 instances, significantly better than AAA on nine instances, and significantly better than BBO and EBO on eight instances. On the contrary, none of the other algorithms performs significantly better than WWO-AB on any instance. Although BBO achieves the best median value on instance #5, there is no significant difference between the results of BBO and WWO-AB on this instance.

Among the other eight comparative algorithms, the overall performance of PSO is the worst, mainly because the PSO’s learning-from-history mechanism often causes the algorithm to be trapped in local optima. The crossover operators of GA and DE and the pheromone accumulation mechanism of ACO have similar negative effects on the search abilities of the algorithms. Therefore, in general, the performances of these four algorithms are significantly worse than those of the other five algorithms that have special mechanisms for balancing global exploration and local exploitation. Such mechanisms include migration operations of BBO and EBO, helical movement of AAA, and propagation of WWO, which can effectively maintain solution diversity, and therefore, suppress premature convergence. Compared to the basic WWO, WWO-AB uses adaptive breaking and population size reduction, which can further improve solution accuracy and accelerate the search process.

Moreover, the box plots in Fig. 9 and the standard deviation values in Table 3 show that, on most instances, the variance of the objective function values obtained by WWO-AB over 30 runs is much smaller than those of the other comparative algorithms. This indicates that, compared to the other algorithms, the results of WWO-AB algorithm are more robust. That is, when a deliveryman employs an algorithm to solve a problem instance, different runs of WWO-AB typically result in similar solutions, which helps to improve the user confidence to the algorithm.

In summary, WWO-AB exhibits the best overall performance among the nine algorithms on the test instances, which validates the effectiveness of the WWO evolutionary operators adapted to solve the considered takeaway order selection and delivery path planning problem. As the maximum running time to produce a solution is less than one second, it is practical for deliverymen to use the proposed WWO-AB to optimize order selection and delivery path planning to improve their performance and the corresponding revenue.

Contributions of machine learning to optimization

Finally, we test the effects of the data-driven machine learning on the revenues of deliverymen. On the 11 test instances, we compare our method using machine learning to estimate order ready time and customer satisfaction level with the method without machine learning, i.e., simply using the expected order ready time given by the store and assuming the middle-level customer satisfaction. Figure 10 compares the revenues obtained by the two methods on the instances. As we can observe, on each instance, the method using machine learning achieves a higher revenue than that without machine learning. The higher the revenue, the larger the percentage of improvement of the method using machine learning over that without is. The results demonstrate that, using machine learning to estimate order ready time and customer satisfaction level more accurately, the deliverymen can select the orders and plan the routes in a more cost-effective manner to improve their revenues. The average revenue obtained by the method using machine learning is 8.22 yuan per hour, significantly better than the 6.47 of the method without machine learning. Such a significant improvement can greatly help both the takeaway deliverymen and company.

Conclusion

The last years have observed a rapid growth of the takeaway delivery market. The increasing number of candidate orders and the corresponding pickup and service points has made order selection and path planning a key challenging problem to deliverymen. In this article, we formulate an integrated takeaway order selection and delivery path optimization problem, which involves uncertain order ready time and customer satisfaction level. We employ a machine-learning approach to infer the uncertain factors based on habit data of takeaway stores and customers. To efficiently solve the problem, we propose a hybrid evolutionary algorithm that adapts the WWO metaheuristic to solve the main problem of order selection and employs the tabu search method to quickly optimize the delivery path for each main solution. Experimental results on test instances constructed based on real food delivery application data demonstrate the performance advantages of the proposed algorithm compared to a set of popular evolutionary algorithms.

Our future work will be devoted to three aspects. First, the present work takes the uncertainty of order ready time into consideration, but assumes that the delivery time can be estimated in an accurate manner. However, in practice, the delivery time is also significantly affected by external factors such as traffic conditions. Therefore, we will consider the uncertain delivery time in the problem, and utilize interfaces from map navigation applications such as Baidu and Gaode to estimate the delivery time. Second, the delivery path is limited by the maximum distance of the vehicle (typically, electronic bicycle) used by the deliveryman. Therefore, we will consider the capacity of battery and its recharging in the problem and solution method [57]. Third, we will extend the problem and algorithm to enable dynamical order selection and path re-planning for deliverymen when they are on the way.