Putting ridesharing to the test: efficient and scalable solutions and the power of dynamic vehicle relocation

Danassis, Panayiotis; Sakota, Marija; Filos-Ratsikas, Aris; Faltings, Boi

doi:10.1007/s10462-022-10145-0

Putting ridesharing to the test: efficient and scalable solutions and the power of dynamic vehicle relocation

Open access
Published: 15 February 2022

Volume 55, pages 5781–5844, (2022)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Putting ridesharing to the test: efficient and scalable solutions and the power of dynamic vehicle relocation

Download PDF

Panayiotis Danassis ORCID: orcid.org/0000-0002-6516-2495¹,
Marija Sakota¹,
Aris Filos-Ratsikas² &
…
Boi Faltings¹

3740 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

We study the optimization of large-scale, real-time ridesharing systems and propose a modular design methodology, Component Algorithms for Ridesharing (CAR). We evaluate a diverse set of CARs (14 in total), focusing on the key algorithmic components of ridesharing. We take a multi-objective approach, evaluating 10 metrics related to global efficiency, complexity, passenger, and platform incentives, in settings designed to closely resemble reality in every aspect, focusing on vehicles of capacity two. To the best of our knowledge, this is the largest and most comprehensive evaluation to date. We (i) identify CARs that perform well on global, passenger, or platform metrics, (ii) demonstrate that lightweight relocation schemes can significantly improve the Quality of Service by up to $50\%$, and (iii) highlight a practical, scalable, on-device CAR that works well across all metrics.

Pricing in emerging mobility services: a comprehensive review

Article 04 March 2023

Matching Algorithms in Ride Hailing Platforms

Utilization rate of the fleet: a novel performance metric for a novel shared mobility

Article Open access 07 December 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The emergence and widespread use of Mobility-on-Demand systems in recent years has had a profound impact on urban transportation in a variety of ways. Amongst other advantages, these systems have the potential to mitigate congestion costs (such as commute times, fuel usage, accident propensity, etc.), enable marketplace optimization for both passengers and drivers, and provide great environmental benefits. A prominent such example is ridesharing^{Footnote 1}. Ridesharing however results in some passenger disruption as well, due to compromise in flexibility, increased travel time, and loss of privacy and convenience. Thus, in the core of any ridesharing platform lies the need for an efficient balance between the incentives of the passengers, and those of the platform^{Footnote 2}.

Optimizing the usage of transportation resources is not an easy task, especially for cities like New York, with more than 13000 taxis and 270 ride requests per minute. For example, (Buchholz 2018) estimates that 45000 customer requests remain unmet each day in New York, despite the fact that approximately 5000 taxis are vacant at any time. In fact, on aggregate, drivers spend about $47\%$ of their time not serving any passengers (Buchholz 2018). Moreover, up to $80\%$ of the taxi rides in Manhattan could be shared by two riders, with only a few minutes increase in travel time (Alonso-Mora et al. 2017a). A more sophisticated matching policy could mitigate these costs by better allocating available supply to demand. As a second example, coordinated vehicle relocation could also be employed to bridge the gap on the spatial supply/demand imbalance and improve passenger satisfaction and Quality of Service (QoS) metrics. Drivers often relocate to find passengers: $61.3\%$ of trips begin in a different neighborhood than the drop-off location of the last passenger (Buchholz 2018), yet currently drivers move without any coordinated search behavior, resulting in spatial search frictions.

Given the importance of the problem for transportation and the economy, it is not surprising that the related literature is populated with a plethora of papers, proposing different solutions along different axes, such as efficiency (Santi et al. 2014; Alonso-Mora et al. 2017a; Agatz et al. 2011; Ashlagi et al. 2017; Huang et al. 2019; Bienkowski et al. 2018; Dickerson et al. 2018; Fagnant and Kockelman 2018; Lokhandwala and Cai 2018), platform revenue (Banerjee et al. 2017; Chen et al. 2019), driver incentives (Ma et al. 2019; Yuen et al. 2019; Garg and Nazerzadeh 2020), fairness (Lesmana et al. 2019; Sühr et al. 2019; Xu and Xu 2020; Nanda et al. 2020), reliability (Fielbaum and Alonso-Mora 2020; Alonso-González et al. 2020), or analyzing the effects on sharing economies (Kooti et al. 2017; Jiang et al. 2018; Ghili and Kumar 2020; Asadpour et al. 2020).

It is well-documented (e.g., (Lesmana et al. 2019)) that all these different desiderata are often contrasting (e.g., fairness vs. revenue), and therefore we should not expect a single algorithm for ridesharing to be superior for all of them; rather, the design of such algorithms should be contingent on the goals of the designer, and which of those properties they consider to be more important for the application at hand. Thus, we want a flexible and adaptable design, able to work best with respect to any set of such objectives with ‘a few tweaks’.

To enable this, we propose a modular approach to algorithm design in ridesharing, in which an algorithm consists of three different components, namely (a) matching passengers with other passengers, (b) assigning rides to vehicles and (c) vehicle relocation, in which the taxis move, when they do not serve passengers, close to positions where requests are expected to appear in the near future. Each component can then be seen as a different (sub)-algorithm, and those algorithms can be appropriately chosen to be geared towards the specific objectives of the designer. As a matter of fact, our approach draws inspiration from several successful algorithms in the ridesharing literature, such as the well-known High Capacity algorithm of (Alonso-Mora et al. 2017a), or the recent algorithm of (Riley et al. 2020), who can both be cast as examples of algorithms in this modular design setting.

1.1 Our contributions

1.1.1 CARs

We initiate the systematic study of Component Algorithms for Ridesharing (CARs). A CAR is an algorithm consisting of three sub-algorithms, each solving one of the following components of the ridesharing problem (Fig. 1).

Matching passengers to other passengers. For this component, the underlying algorithmic problem is that of Online Maximum Weight Matching, where the “online” part stems from the fact that passenger requests appear at different points in time, and we have to account for the future when deciding which passengers to match. As such, we have a lot of classic as well as modern matching algorithms at our disposal.
Assigning rides to vehicles. For this component, the underlying algorithmic problem can either be seen as an Online Maximum Weight Bipartite Matching, or as an instance of the k-Taxi Problem and by extension as the famous k-Server problem from the literature of online algorithms. Similarly to above, there is a large set of classic and modern solutions that one can plug-in as components for this part.
Vehicle Relocation. For this component, the objective is to use historical data to predict the location of future requests and move idle taxis closer to those locations. From an algorithmic standpoint, this problem can be cast as either as k-Facility Location problem, concerned with the optimal placement of facilities (taxis) to minimize transportation costs, or as an Online Maximum Weight Matching problem on the history of requests.

1.1.2 Evaluation platform

While several papers in the literature provide evaluations on realistic datasets, (e.g., see (Riley et al. 2020; Santi et al. 2014; Alonso-Mora et al. 2017a; Agatz et al. 2011; Santos and Xavier 2013; Danassis et al. 2019), they either (a) only consider parts of the ridesharing problem and therefore do not propose end-to-end solutions, (b) only evaluate a few newly-proposed algorithms against some basic baselines, (c) only consider a limited number of performance metrics, predominantly with regard to the overall efficiency and often without regard to QoS metrics, or (d) perform evaluations on a much smaller scale, thus not capturing the real-life complexity of the problem. On the contrary, our work provides a comprehensive evaluation of a large number of proposed algorithms, over multiple different metrics, and for real-world scale, end-to-end problems. Specifically:

We meticulously design an experimental setting to resemble reality as close as possible in every aspect of the problem. To the best of our knowledge, this is the first end-to-end experimental evaluation of this magnitude, and could serve as a common-ground for evaluating future work in a setting designed to capture real-world challenges.
We evaluate our CARs for a host of different objectives (10 metrics) related to global efficiency, complexity, passenger, and platform incentives (see Table 2).

We focus on (shared) rides of at most two requests (i.e., vehicles of capacity two) for two reasons: complexity, and passenger satisfaction; as we explain in detail in Sect. 3.2.4.

1.1.3 Results

Applying the modular approach we advocated above, we design a large set of CARs, based on different classic and modern algorithms for the different components (14 in total, see Table 1). The main take-away is the following:

CARs based on off-line, in-batches maximum-weight matching approaches perform well on global efficiency and passenger related metrics.
CARs based on k-server algorithms perform well on platform related metrics (e.g., the Balance algorithm (Manasse et al. 1990)).
Lightweight CARs perform better in real-world, large-scale settings since real-time constraints dictate short planning windows which can diminish the benefit of cumbersome optimization techniques compared to myopic approaches.
Simple, lightweight relocation schemes can significantly improve Quality of Service metrics by up to $50\%$.
We identify a scalable, on-device CAR based on ALMA (Danassis et al. 2019) that performs well across the board.

Our findings provide convincing evidence to a ridesharing platform as to which combination of components would be most suitable for a given set of objectives.

2 Discussion and related work

The literature on ridesharing is rather extensive; here we only highlight the key algorithmic principles in our design of CARs.

The dynamic ridesharing – and the closely related dynamic dial-a-ride (see (Agatz et al. 2012)) – problem has drawn the attention of diverse disciplines over the past few years, from operations research to transportation engineering, and computer science. Solution approaches include constrained optimization (Qian et al. 2017; Simonetto et al. 2019; Agatz et al. 2011; Alonso-Mora et al. 2017a; Riley et al. 2020), weighted matching (Ashlagi et al. 2017; Bei and Zhang 2018; Dickerson et al. 2018; Zhao et al. 2019; Danassis et al. 2019), other heuristics (Qian et al. 2017; Santos and Xavier 2015; Bathla et al. 2018; Lowalekar et al. 2019; Santos and Xavier 2013; Pelzer et al. 2015; Gao et al. 2017; Shah et al. 2020), reinforcement learning (Guériau and Dusparic 2018; Li et al. 2019; He and Shin 2019), or model predictive control (Chen and Cassandras 2019; Riley et al. 2020; Tsao et al. 2019), among others. We refer the interested reader to the following surveys (Agatz et al. 2012; Silwal et al. 2019; Furuhata et al. 2013; Ho et al. 2018; Mourad et al. 2019; Cordeau and Laporte 2007) for a review on the optimization challenges, various algorithmic designs adopted over the years, a classification of existing ridesharing systems, models and algorithms for shared mobility, and finally models and solution methodologies for the dial-a-ride problem, respectively.

As we mentioned in the introduction, the key algorithmic components of ridesharing are the following. First, it is an online problem, as the decisions made at some point in time clearly affect the possible decisions in the future, and therefore the the literature of online algorithms and competitive analysis (Borodin and El-Yaniv 2005; Manasse et al. 1988) offers clear-cut candidates for CARs. Second, all of the components can be seen as some type of matching both for bipartite graphs (for matching passengers with taxis, or idle taxis with ‘future’ requests) and for general graphs (for matching passengers to shared rides). In fact, several of the algorithms that have been proposed in the literature for the problem are for different variants of online matching.

Finally, ridesharing displays an inherent connection to the k-taxi problem (Coester and Koutsoupias 2019; Buchbinder et al. 2020; Fiat et al. 1994; Kosoresow 1996), which, in turn, is a generalization of the well-known k-server problem (Koutsoupias and Papadimitriou 1995; Koutsoupias 2009)^{Footnote 3}. In the k-taxi problem, once a request appears (with a source and a destination), one of the k taxis at the platform’s disposal must serve the request. Viewing shared rides (multiple passengers that have already been matched in a previous step) as requests, one can clearly apply the k-taxi (and k-server algorithms) to the ridesharing setting. Granted, the k-server algorithms have been designed to operate in a more challenging setting in which (a) the requests have to be served immediately, whereas normally there is some leeway in that regard, often at the expense of customer satisfaction, and (b) the positions of requests are typically adversarially chosen, rather than following some distribution, as is the case in reality. Despite those facts, the fundamental idea behind these algorithms is a pivotal part of ridesharing, as it aims to serve existing requests efficiently, but at the same time place the vehicles as well as possible to serve future requests. This is also the main principle of the relocation strategies for idle taxis.

The algorithms that we consider are appropriate modifications of the most significant ones that have been proposed for the aforementioned key algorithmic primitives of the ridesharing problem, as well as heuristic approaches which are based on the same principles, but were specifically designed with the ridesharing application in mind. We emphasize that such modifications are needed, primarily because many of these algorithms were tailored for sub-problems of the ridesharing setting, and end-to-end solutions in the literature are rather scarce.

Much of the related work in the literature focuses on approaches that are inherently centralized and require knowledge of the full ridesharing network, which makes them rather computationally intensive. As an additional goal of our investigation, we would like to identify solutions that are lightweight, decentralized, and which ideally run on-device. Of course, some hybrid and decentralized approaches for the ridesharing problem have been proposed (e.g., (Simonetto et al. 2019; Guériau and Dusparic 2018)), and several of the algorithms that we include in our experimental evaluation can be implemented in a decentralized manner (e.g., (Giordani et al. 2010; Ismail and Sun 2017; Zavlanos et al. 2008; Bürger et al. 2012)), but that would typically require a larger amount of communication between the agents; in this case, the vehicles. As it turns out though, the ALMA algorithm of (Danassis et al. 2019), which has been designed with precisely these objectives in mind (low computational complexity, scalability, and low communication cost), performs very well across the board with respect to our objectives.

The third component of our CARs is the relocation of idle taxis. Relocation is an important component of a successful ridesharing application. Many studies in shared mobility systems have shown that the adoption of a relocation strategy can help improve the system performance for their specific context (Guériau and Dusparic 2018; Vosooghi et al. 2019; Martínez et al. 2017; Bélanger et al. 2016; Ruch et al. 2018; Alonso-Mora et al. 2017a; Buchholz 2018; Lioris et al. 2016; Spieser et al. 2014; Tsao et al. 2019; van Engelen et al. 2018; Wen et al. 2017; Wallar et al. 2018). Strategies include using a short window of known active requests (Alonso-Mora et al. 2017a), historical demand (Guériau and Dusparic 2018; Alonso-Mora et al. 2017b; Fielbaum et al. 2021b; Zhou et al. 2013; Xue et al. 2015; van Engelen et al. 2018), or techniques to predict future demand (Spieser et al. 2016). Yet, relocation by nature increases vehicle travel distance, leading to undesirable consequences (economical, environmental, maintenance, management of human resources, etc.), thus a balance needs to be struck. Most of the employed relocation approaches are course-grained; the network is generally divided into several zones, blocks, etc. (Guériau and Dusparic 2018; Vosooghi et al. 2019; Martínez et al. 2017) and the entities (e.g., the vehicles) move between the zones. However, compared to other shared mobility systems, dynamic ridesharing posses unique challenges, meaning that such coarse-grained approaches are not appropriate: most of them are centralized – thus computationally intensive and not scalable –, they might not take into account the actions of other vehicles, potentially leading to over-saturation of high demand areas, and, most importantly, they are slow to adapt to the highly dynamic nature of the problem (e.g., responding to high demand generated by a concert, or the fact that vehicles remain free for only a few minutes at a time). The problem clearly calls for fine-grained solutions, yet such approaches in the literature are still rather scarce. In this paper, we employ such a fine-grained relocation scheme (similarly to (Alonso-Mora et al. 2017a)), based on matching between the idle taxis and the potential requests, which is better suited for the problem at hand.

Relocation can be either viewed as the k-center or k-Facility Location Problem (Guha and Khuller 1999), or as an Online Maximum Weight Matching problem on the history of requests. Given the high complexity of the former problems (they are both NP-hard, in fact, APX-hard (Hsu and Nemhauser 1979; Feder and Greene 1988)), we have opted for the latter interpretation.

3 Problem statement & modeling

In this section we formally present the Ridesharing problem. To avoid introducing unnecessary notation, we only present the description of the model here; precise notation and details are provided in the respective sections where they are used.

In the Ridesharing problem there is a (potentially infinite) metric space ${\mathcal {X}}$ representing the topology of the environment, equipped with a distance function $\delta : {\mathcal {X}} \times {\mathcal {X}} \rightarrow \mathbb {R}_{\ge 0}$. Both are known in advance. At any moment, there is a (dynamic) set of available taxi vehicles ${\mathcal {V}}_t$, ready to serve customer requests (i.e., drive to the pick-up, and subsequently to the destination location). Between serving requests, vehicles can relocate to locations of potentially higher demand, to mitigate spatial search frictions between drivers. Customer requests appear in an online manner at their respective pick-up locations, wait to potentially be matched to a shared ride, and finally are served by a taxi to their respective destination. In order for two requests to be able to share a ride, they must satisfy spatial, and temporal constraints. The former dictates that requests should be matched only if there is good spatial overlap among their routes. Yet, due to the latter constraint, requests cannot be matched even if they have perfect spatial overlap, if they are not both ‘active’ at the same time. Finally, ridesharing is an inherently online problem, as we are unaware of the requests that will appear in the future, and need to make decisions before the requests expire, while taking into account the dynamics of the fleet of taxis.

3.1 Performance metrics

The goal is to minimize the cumulative distance driven by the fleet of taxis, while maintaining high Quality of Service (QoS), given that we serve all requests (service guarantee). Serving all requests improves passenger satisfaction, and, most importantly, allows us to ground our evaluation to a common scenario, ensuring a fair comparison.

3.1.1 Global metrics

Distance Driven: Minimize the cumulative distance driven by all vehicles for serving all the requests. We chose this objective as it directly correlates to passenger, company, and environmental objectives (minimize operational cost, delay, CO$_2$ emissions, maximize the number of shared rides, improve QoS, etc.). All of the evaluated algorithms have to serve all the requests, either as shared, or single rides.

Complexity: Real-world time constraints dictate that the employed solution produces results in a reasonable time-frame^{Footnote 4}.

3.1.2 Passenger specific metrics—Quality of Service (QoS)

Time to Pair: Expected time to be paired in a shared ride, i.e., $\mathbb {E}[t_{\text {paired}} - t_{\text {open}}]$, where $t_{\text {open}}, t_{\text {paired}}$ denote the time the request appeared, and was paired as a shared ride, respectively. If the request is served as a single ride, then $t_{\text {paired}}$ refers to the time the algorithm chose to serve it as such.

Time to Pair with Taxi: Expected time to be paired with a taxi, i.e., $\mathbb {E}[t_{\text {taxi}} - t_{\text {paired}}]$, where $t_{\text {taxi}}$ denotes the time the (shared) ride was paired with a taxi.

Time to Pick-up: Expected time to passenger pickup, i.e., $\mathbb {E}[t_{\text {pickup}} - t_{\text {taxi}}]$, where $t_{\text {pickup}}$ denotes the time the request was picked-up.

Delay: Additional travel time over the expected direct travel time (when served as a single ride, instead of a shared ride), i.e., $\mathbb {E}[(t_{\text {dest}} - t_{\text {pickup}}) - (t'_{\text {dest}} - t_{\text {pickup}})]$. $t_{\text {dest}}$, and $t'_{\text {dest}}$ denote the time the request reaches, and would have reached as a single ride, its destination.

Research conducted by ridesharing companies shows that passengers’ satisfaction level remains sufficiently high as long as the pick-up time is less than a certain threshold. The latter is corroborated by data on booking cancellation rate against pick-up time (Tang et al. 2017). In other words, passengers would rather have a short pick-up time and long detour, than vice-versa (Brown 2016b). This also suggests that an effective relocation scheme can considerably improve passenger satisfaction by reducing the average pick-up time (see Sect. 7.2.7).

Given the importance of short pick-up times in passengers’ satisfaction, we opted to distinguish and study each segment of the waiting process separately (‘Time to Pair’, ‘Time to Pair with Taxi’, and ‘Time to Pick-up’). To the best of our knowledge, we are the first to do so. Such analysis can provide a clear picture of sources of inefficiency to a ridesharing platform, and improve the overall satisfaction which in turn correlates to the growth of the company.

3.1.3 Platform specific metrics

Quality of Service (QoS): Refer to the aforementioned, passenger specific metrics^{Footnote 5}. Improving the QoS to their costumers correlates to the growth of the company.

Number of Shared Rides: Related to the profit. By carrying more than one passenger at a time, vehicles can serve more requests in a day, which consequently, increases the income (Widdows et al. 2017). The matching rate is important especially in the nascent stage of a ridesharing platform (Dutta and Sholley 2018).

Frictions: Waiting time experienced by drivers between serving requests (i.e., time between dropping-off a ride, and getting matched with another). Search frictions occur when drivers are unable to locate rides due to spatial supply and demand imbalance. Even though in our scenario matchings are performed automatically, without any searching involved by the drivers, lower frictions indicate a better distribution of the platform’s supply.

3.2 Modeling

Our evaluation setting is meticulously designed to resemble reality as closely as possible, in every aspect of the problem. We achieve this by using actual data from the NYC’s yellow taxi trip records^{Footnote 6} – both for modeling customer requests and taxis – and running our simulations to the scale of the actual problem faced by the ridesharing platforms (we run simulations with more than 390, 000 requests and 12, 000 taxis). Moreover, we have exhaustively designed every detail of the problem, such as speed of the vehicles, initial positions, distance function, etc. In what follows, we describe each design aspect in detail.

3.2.1 Dataset

We have used the yellow taxi trip records of 2016, provided by the NYC Taxi and Limousine Commission$^6$. The dataset was cleaned to remove requests with travel time shorter than 1 minute, or invalid geo-locations (e.g., outside Manhattan, Bronx, Staten Island, Brooklyn, or Queens). For every request, the dataset provides amongst others the pick-up and drop-off times, and geo-location coordinates. Time is discrete, with granularity of 1 minute (same as the dataset). On average, there are 272 new requests per minute, totaling to 391479 requests in the broader NYC area (352455 in Manhattan) on the evaluated day (Jan, 15). Figure 2 depicts the number of request per minute on the aforementioned day.

3.2.2 Taxi vehicles

A unique feature of the NYC Yellow taxis is that they may only be hailed from the street and are not authorized to conduct pre-arranged pick-ups. This provides an ideal setting for a counter-factual analysis for several reasons: (1) We can assume a realistic position of each taxi at the beginning of the simulation (last drop-off location). (2) Door-to-door service can be inefficient (Fielbaum et al. 2021a; Stiglic et al. 2015), thus users may be requested to walk to/from a nearby fast street. Given that users have presumably hailed the taxis from larger streets, this results to a more accurate modeling of the origins of supply and demand. Finally, (3) all observed rides are obtained through search, thus – assuming reasonable prices, and delays – customers do not have nor are willing to take an alternative means of transportation. The latter validates our choice that all of the algorithms considered will have to eventually serve all the requests.

By law, there are 13, 587 taxis in NYC$^12$. The majority of the results presented in this paper use a much lower number of vehicles (what we call base number) for three reasons: (1) to reduce the complexity of the problem, given that most of the employed algorithms can not handle such a large number of vehicles, (2) to evaluate under resource scarcity – making the problem harder – to better differentiate between the results, and (3) to investigate the possibility of a more efficient utilization of resources, with minimal cost to the consumers. However, we still present simulations for a wide range of vehicles, up to close to the total number.

The number, initial location, and speed of the taxi vehicles were calculated as follows:

We calculated the base number of taxis, as the minimum number of taxis required to serve all requests as single rides (no ridesharing). If a request appears, and all taxis are occupied serving other requests, we increase the required number of taxis by one. This resulted to around $4000 - 5000$ vehicles (depending on the size of the simulation, see Sect. 7.2). Simulations were conducted for $\{\times 0.5, \times 0.75, \times 1.0, \times 2.0, \times 3.0\}$ the base number.
Given a number of taxis, V, the initial position of each taxi is the drop-off location of the last V requests, prior to the starting time of the simulation. To avoid cold start, we compute the drop-off time of each request, and assume the vehicle occupied until then.
The vehicles’ average speed is estimated to 6.2 m/s (22.3 km/h), based on the trip distance and time per trip as reported in the dataset, and corroborated by the related literature (in (Santi et al. 2014) the speed was estimated at $5.5 - 8.5$ m/s depending on the time of day).

3.2.3 Customer requests

A request, r, is a tuple $\langle t_r, s_r, d_r, k_r \rangle$. Request r appears (becomes open) at its respective pick-up time ($t_r$), and geo-location ($s_r$). Let $d_r$ denote the destination. Each request admits a willingness to wait ($k_r$) to find a match (rideshare), i.e., we assume dynamic waiting periods per request. The rationale behind $k_r$ is that requests with longer trips are more willing to wait to find a match than requests with destinations near-by. After $k_r$ time-steps we call request r, critical. If a critical request is not matched, it has to be served as a single ride. Recall that in our setting all of the requests must be served. Let ${\mathcal {R}}_t^{\text {open}}, {\mathcal {R}}_t^{\text {critical}}$ denote the sets of open, and critical requests respectively, and let ${\mathcal {R}}_t = {\mathcal {R}}_t^{\text {open}} \cup {\mathcal {R}}_t^{\text {critical}}$.

We calculate $k_r$ as in related literature (Danassis et al. 2019). Let $w_{\text {min}}$, and $w_{\text {max}}$ be the minimum and maximum possible waiting time, i.e., $w_{\text {min}} \le k_r \le w_{\text {max}}, \forall r$. Knowing $s_r, d_r$, we can compute the expected trip time ($\mathbb {E}[t_{\text {trip}}]$). Assuming people are willing to wait proportional to their trip time, let $k_r = q \times \mathbb {E}[t_{\text {trip}}]$, where $q \in [0, 1]$. $w_{\text {min}}, w_{\text {max}}$, and q can be set by the ridesharing company, based on customer satisfaction (following (Danassis et al. 2019), let $w_{\text {min}} = 1, w_{\text {max}} = 3$, and $q = 0.1$).

3.2.4 Rides

A (shared)ride, $\rho$, is a pair $\langle r_1, r_2 \rangle$, composed of two requests. If a request r is served as a single ride, then $r_1 = r_2 = r$. Let ${\mathcal {P}}_t$ denote the set of rides waiting to be matched to a taxi at time t. Contrary to some recent literature on high capacity ridesharing (e.g., (Alonso-Mora et al. 2017a; Lowalekar et al. 2019)), we purposefully restricted ourselves to rides of at most two requests for two reasons: complexity, and passenger satisfaction. The complexity of the problem grows rapidly as the number of potential matches increases, while most of the proposed/evaluated approaches already struggle to tackle matchings of size two on the scale of a real-world application. Moreover, even though a fully utilized vehicle would ultimately be a more efficient use of resources, it diminishes passenger satisfaction (a frequent worry being that the ride will become interminable, according to internal research by ridesharing companies) (Widdows et al. 2017; Brown 2016a). Given that a hard constraint is the serving of all requests, we do not assume a time limit on matching rides with taxis; instead we treat it as a QoS metric.

3.2.5 Distance function

The optimal choice for a distance function would be the actual driving distance. Yet, our simulations require trillions of distance calculations, which is not attainable. Given that the locations are given in latitude and longitude coordinates, it is tempting to use the Haversine formula^{Footnote 7} to estimate the Euclidean distance, as in related literature (Santos and Xavier 2013; Brown 2016a). We have opted to use the Manhattan distance, given that the simulation takes place mostly in Manhattan. To evaluate our choice, we collected more than 12 million actual driving distances using the Open Source Routing Machine (project-osrm.org), which computes the shortest path in road networks. Manhattan distance’s standard and mean squared error, compared to the actual driving distance, was $-0.5 \pm 2.9$ km, and $1.7 \pm 2.4$ km respectively, while Euclidean distance’s was $-3.2 \pm 3.8$ km, and $3.2 \pm 3.8$ km respectively.

3.2.6 Embedding into HSTs

A starting point of many of the employed k-server algorithms is embedding the input metric space ${\mathcal {X}}$ into a distribution $\mu$ over $\sigma$-hierarchically well-separated trees (HSTs), with separation $\sigma = \Theta (\log |{\mathcal {X}}| \log (k \log |{\mathcal {X}}|))$, where $|{\mathcal {X}}|$ denotes the number of points. It has been shown that solving the problem on HSTs suffices, as any finite metric space can be embedded into a probability distribution over HSTs with low distortion (Fakcharoenphol et al. 2003). The distortion is of order ${\mathcal {O}}(\sigma \log _\sigma |{\mathcal {X}}|)$, and the resulting HSTs have depth ${\mathcal {O}}(\log _\sigma \Delta )$, where $\Delta$ is the diameter of ${\mathcal {X}}$ (Bansal et al. 2015).

Given the popularity of the aforementioned method, it is worth examining the size of the resulting trees. Given that the geo-coordinate system is a discrete metric space, we could directly embed it into HSTs. Yet, the size of the space is huge, thus for better discretization we have opted to generate the graph of the street network of NYC. To do so, we used data from openstreetmap.org. Similarly to (Santi et al. 2014), we filtered the streets selecting only primary, secondary, tertiary, residential, unclassified, road, and living street classes, using those as undirected edges and street intersections as nodes. The resulting graph for NYC contains 66543 nodes, and 95675 edges (5018, and 8086 for Manhattan). Given that graph, we generate the HSTs (Santi et al. 2014).

Table 1 Evaluated CARs

Putting ridesharing to the test: efficient and scalable solutions and the power of dynamic vehicle relocation

Abstract

Similar content being viewed by others

Pricing in emerging mobility services: a comprehensive review

Matching Algorithms in Ride Hailing Platforms

Utilization rate of the fleet: a novel performance metric for a novel shared mobility

Explore related subjects

1 Introduction

1.1 Our contributions

1.1.1 CARs

1.1.2 Evaluation platform

1.1.3 Results

2 Discussion and related work

3 Problem statement & modeling

3.1 Performance metrics

3.1.1 Global metrics

3.1.2 Passenger specific metrics—Quality of Service (QoS)

3.1.3 Platform specific metrics

3.2 Modeling

3.2.1 Dataset

3.2.2 Taxi vehicles

3.2.3 Customer requests

3.2.4 Rides

3.2.5 Distance function

3.2.6 Embedding into HSTs

4 Component algorithms for ridesharing

4.1 CAR components

4.1.1 Maximum weight matching (MWM)

4.1.2 ALtruistic MAtching Heuristic (ALMA), (Danassis et al. 2019, 2022, 2021; Danassis 2022; Danassis and Faltings 2020)

4.1.3 Greedy

4.1.4 Approximation (Appr), (Bei and Zhang 2018)

4.1.5 Postponed greedy (PG), (Ashlagi et al. 2019)

4.1.6 Greedy dual (GD), (Bienkowski et al. 2018)

4.1.7 Balance (Bal), (Manasse et al. 1990)

4.1.8 Harmonic (Har), (Raghavan and Snir 1989)

4.1.9 Double coverage (DC), (Chrobak et al. 1990)

4.1.10 Work function (WFA), (Chrobak and Larmore 1991b; Koutsoupias and Papadimitriou 1995)

4.1.11 k-Taxi, (Coester and Koutsoupias 2019)

4.1.12 High capacity (HC), (Alonso-Mora et al. 2017a)

4.1.13 Baseline: single ride

4.1.14 Baseline: random

5 Scalability challenges

5.1 ILP approaches

5.2 MWM approaches

5.3 k-server/taxi algorithms

5.4 Observability

6 Vehicle relocation challenges

6.1 Patterns in customer requests

6.2 Relocation matching graph

7 Evaluation

7.1 Employed CARs

7.2 Simulation results

7.2.1 Distance driven

7.2.2 Complexity

7.2.3 Time to pick-up

7.2.4 Delay

7.2.5 Frictions

7.2.6 Time to pair with taxi & number of shared rides

7.2.7 Relocation

7.2.8 ALMA as an end-to-end CAR

7.3 High-level analysis

8 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Simulation results in detail

2.1 08:00–09:00 – Manhattan

2.2 00:00–23:59 (full day)–Manhattan

2.3 08:00–09:00–Broader NYC Area (Manhattan, Bronx, Staten Island, Brooklyn, Queens)

2.4 00:00–23:59 (full day) – Broader NYC Area (Manhattan, Bronx, Staten Island, Brooklyn, Queens)

2.5 08:00–08:10 – Manhattan

2.6 Dynamic vehicle relocation – 00:00–23:59 (full day) – Manhattan

2.7 End-To-End Solution – 00:00–23:59 (full day) – Manhattan