Abstract
We study the optimization of largescale, realtime ridesharing systems and propose a modular design methodology, Component Algorithms for Ridesharing (CAR). We evaluate a diverse set of CARs (14 in total), focusing on the key algorithmic components of ridesharing. We take a multiobjective approach, evaluating 10 metrics related to global efficiency, complexity, passenger, and platform incentives, in settings designed to closely resemble reality in every aspect, focusing on vehicles of capacity two. To the best of our knowledge, this is the largest and most comprehensive evaluation to date. We (i) identify CARs that perform well on global, passenger, or platform metrics, (ii) demonstrate that lightweight relocation schemes can significantly improve the Quality of Service by up to \(50\%\), and (iii) highlight a practical, scalable, ondevice CAR that works well across all metrics.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The emergence and widespread use of MobilityonDemand systems in recent years has had a profound impact on urban transportation in a variety of ways. Amongst other advantages, these systems have the potential to mitigate congestion costs (such as commute times, fuel usage, accident propensity, etc.), enable marketplace optimization for both passengers and drivers, and provide great environmental benefits. A prominent such example is ridesharing^{Footnote 1}. Ridesharing however results in some passenger disruption as well, due to compromise in flexibility, increased travel time, and loss of privacy and convenience. Thus, in the core of any ridesharing platform lies the need for an efficient balance between the incentives of the passengers, and those of the platform^{Footnote 2}.
Optimizing the usage of transportation resources is not an easy task, especially for cities like New York, with more than 13000 taxis and 270 ride requests per minute. For example, (Buchholz 2018) estimates that 45000 customer requests remain unmet each day in New York, despite the fact that approximately 5000 taxis are vacant at any time. In fact, on aggregate, drivers spend about \(47\%\) of their time not serving any passengers (Buchholz 2018). Moreover, up to \(80\%\) of the taxi rides in Manhattan could be shared by two riders, with only a few minutes increase in travel time (AlonsoMora et al. 2017a). A more sophisticated matching policy could mitigate these costs by better allocating available supply to demand. As a second example, coordinated vehicle relocation could also be employed to bridge the gap on the spatial supply/demand imbalance and improve passenger satisfaction and Quality of Service (QoS) metrics. Drivers often relocate to find passengers: \(61.3\%\) of trips begin in a different neighborhood than the dropoff location of the last passenger (Buchholz 2018), yet currently drivers move without any coordinated search behavior, resulting in spatial search frictions.
Given the importance of the problem for transportation and the economy, it is not surprising that the related literature is populated with a plethora of papers, proposing different solutions along different axes, such as efficiency (Santi et al. 2014; AlonsoMora et al. 2017a; Agatz et al. 2011; Ashlagi et al. 2017; Huang et al. 2019; Bienkowski et al. 2018; Dickerson et al. 2018; Fagnant and Kockelman 2018; Lokhandwala and Cai 2018), platform revenue (Banerjee et al. 2017; Chen et al. 2019), driver incentives (Ma et al. 2019; Yuen et al. 2019; Garg and Nazerzadeh 2020), fairness (Lesmana et al. 2019; Sühr et al. 2019; Xu and Xu 2020; Nanda et al. 2020), reliability (Fielbaum and AlonsoMora 2020; AlonsoGonzález et al. 2020), or analyzing the effects on sharing economies (Kooti et al. 2017; Jiang et al. 2018; Ghili and Kumar 2020; Asadpour et al. 2020).
It is welldocumented (e.g., (Lesmana et al. 2019)) that all these different desiderata are often contrasting (e.g., fairness vs. revenue), and therefore we should not expect a single algorithm for ridesharing to be superior for all of them; rather, the design of such algorithms should be contingent on the goals of the designer, and which of those properties they consider to be more important for the application at hand. Thus, we want a flexible and adaptable design, able to work best with respect to any set of such objectives with ‘a few tweaks’.
To enable this, we propose a modular approach to algorithm design in ridesharing, in which an algorithm consists of three different components, namely (a) matching passengers with other passengers, (b) assigning rides to vehicles and (c) vehicle relocation, in which the taxis move, when they do not serve passengers, close to positions where requests are expected to appear in the near future. Each component can then be seen as a different (sub)algorithm, and those algorithms can be appropriately chosen to be geared towards the specific objectives of the designer. As a matter of fact, our approach draws inspiration from several successful algorithms in the ridesharing literature, such as the wellknown High Capacity algorithm of (AlonsoMora et al. 2017a), or the recent algorithm of (Riley et al. 2020), who can both be cast as examples of algorithms in this modular design setting.
1.1 Our contributions
1.1.1 CARs
We initiate the systematic study of Component Algorithms for Ridesharing (CARs). A CAR is an algorithm consisting of three subalgorithms, each solving one of the following components of the ridesharing problem (Fig. 1).

Matching passengers to other passengers. For this component, the underlying algorithmic problem is that of Online Maximum Weight Matching, where the “online” part stems from the fact that passenger requests appear at different points in time, and we have to account for the future when deciding which passengers to match. As such, we have a lot of classic as well as modern matching algorithms at our disposal.

Assigning rides to vehicles. For this component, the underlying algorithmic problem can either be seen as an Online Maximum Weight Bipartite Matching, or as an instance of the kTaxi Problem and by extension as the famous kServer problem from the literature of online algorithms. Similarly to above, there is a large set of classic and modern solutions that one can plugin as components for this part.

Vehicle Relocation. For this component, the objective is to use historical data to predict the location of future requests and move idle taxis closer to those locations. From an algorithmic standpoint, this problem can be cast as either as kFacility Location problem, concerned with the optimal placement of facilities (taxis) to minimize transportation costs, or as an Online Maximum Weight Matching problem on the history of requests.
1.1.2 Evaluation platform
While several papers in the literature provide evaluations on realistic datasets, (e.g., see (Riley et al. 2020; Santi et al. 2014; AlonsoMora et al. 2017a; Agatz et al. 2011; Santos and Xavier 2013; Danassis et al. 2019), they either (a) only consider parts of the ridesharing problem and therefore do not propose endtoend solutions, (b) only evaluate a few newlyproposed algorithms against some basic baselines, (c) only consider a limited number of performance metrics, predominantly with regard to the overall efficiency and often without regard to QoS metrics, or (d) perform evaluations on a much smaller scale, thus not capturing the reallife complexity of the problem. On the contrary, our work provides a comprehensive evaluation of a large number of proposed algorithms, over multiple different metrics, and for realworld scale, endtoend problems. Specifically:

We meticulously design an experimental setting to resemble reality as close as possible in every aspect of the problem. To the best of our knowledge, this is the first endtoend experimental evaluation of this magnitude, and could serve as a commonground for evaluating future work in a setting designed to capture realworld challenges.

We evaluate our CARs for a host of different objectives (10 metrics) related to global efficiency, complexity, passenger, and platform incentives (see Table 2).
We focus on (shared) rides of at most two requests (i.e., vehicles of capacity two) for two reasons: complexity, and passenger satisfaction; as we explain in detail in Sect. 3.2.4.
1.1.3 Results
Applying the modular approach we advocated above, we design a large set of CARs, based on different classic and modern algorithms for the different components (14 in total, see Table 1). The main takeaway is the following:

CARs based on offline, inbatches maximumweight matching approaches perform well on global efficiency and passenger related metrics.

CARs based on kserver algorithms perform well on platform related metrics (e.g., the Balance algorithm (Manasse et al. 1990)).

Lightweight CARs perform better in realworld, largescale settings since realtime constraints dictate short planning windows which can diminish the benefit of cumbersome optimization techniques compared to myopic approaches.

Simple, lightweight relocation schemes can significantly improve Quality of Service metrics by up to \(50\%\).

We identify a scalable, ondevice CAR based on ALMA (Danassis et al. 2019) that performs well across the board.
Our findings provide convincing evidence to a ridesharing platform as to which combination of components would be most suitable for a given set of objectives.
2 Discussion and related work
The literature on ridesharing is rather extensive; here we only highlight the key algorithmic principles in our design of CARs.
The dynamic ridesharing – and the closely related dynamic dialaride (see (Agatz et al. 2012)) – problem has drawn the attention of diverse disciplines over the past few years, from operations research to transportation engineering, and computer science. Solution approaches include constrained optimization (Qian et al. 2017; Simonetto et al. 2019; Agatz et al. 2011; AlonsoMora et al. 2017a; Riley et al. 2020), weighted matching (Ashlagi et al. 2017; Bei and Zhang 2018; Dickerson et al. 2018; Zhao et al. 2019; Danassis et al. 2019), other heuristics (Qian et al. 2017; Santos and Xavier 2015; Bathla et al. 2018; Lowalekar et al. 2019; Santos and Xavier 2013; Pelzer et al. 2015; Gao et al. 2017; Shah et al. 2020), reinforcement learning (Guériau and Dusparic 2018; Li et al. 2019; He and Shin 2019), or model predictive control (Chen and Cassandras 2019; Riley et al. 2020; Tsao et al. 2019), among others. We refer the interested reader to the following surveys (Agatz et al. 2012; Silwal et al. 2019; Furuhata et al. 2013; Ho et al. 2018; Mourad et al. 2019; Cordeau and Laporte 2007) for a review on the optimization challenges, various algorithmic designs adopted over the years, a classification of existing ridesharing systems, models and algorithms for shared mobility, and finally models and solution methodologies for the dialaride problem, respectively.
As we mentioned in the introduction, the key algorithmic components of ridesharing are the following. First, it is an online problem, as the decisions made at some point in time clearly affect the possible decisions in the future, and therefore the the literature of online algorithms and competitive analysis (Borodin and ElYaniv 2005; Manasse et al. 1988) offers clearcut candidates for CARs. Second, all of the components can be seen as some type of matching both for bipartite graphs (for matching passengers with taxis, or idle taxis with ‘future’ requests) and for general graphs (for matching passengers to shared rides). In fact, several of the algorithms that have been proposed in the literature for the problem are for different variants of online matching.
Finally, ridesharing displays an inherent connection to the ktaxi problem (Coester and Koutsoupias 2019; Buchbinder et al. 2020; Fiat et al. 1994; Kosoresow 1996), which, in turn, is a generalization of the wellknown kserver problem (Koutsoupias and Papadimitriou 1995; Koutsoupias 2009)^{Footnote 3}. In the ktaxi problem, once a request appears (with a source and a destination), one of the k taxis at the platform’s disposal must serve the request. Viewing shared rides (multiple passengers that have already been matched in a previous step) as requests, one can clearly apply the ktaxi (and kserver algorithms) to the ridesharing setting. Granted, the kserver algorithms have been designed to operate in a more challenging setting in which (a) the requests have to be served immediately, whereas normally there is some leeway in that regard, often at the expense of customer satisfaction, and (b) the positions of requests are typically adversarially chosen, rather than following some distribution, as is the case in reality. Despite those facts, the fundamental idea behind these algorithms is a pivotal part of ridesharing, as it aims to serve existing requests efficiently, but at the same time place the vehicles as well as possible to serve future requests. This is also the main principle of the relocation strategies for idle taxis.
The algorithms that we consider are appropriate modifications of the most significant ones that have been proposed for the aforementioned key algorithmic primitives of the ridesharing problem, as well as heuristic approaches which are based on the same principles, but were specifically designed with the ridesharing application in mind. We emphasize that such modifications are needed, primarily because many of these algorithms were tailored for subproblems of the ridesharing setting, and endtoend solutions in the literature are rather scarce.
Much of the related work in the literature focuses on approaches that are inherently centralized and require knowledge of the full ridesharing network, which makes them rather computationally intensive. As an additional goal of our investigation, we would like to identify solutions that are lightweight, decentralized, and which ideally run ondevice. Of course, some hybrid and decentralized approaches for the ridesharing problem have been proposed (e.g., (Simonetto et al. 2019; Guériau and Dusparic 2018)), and several of the algorithms that we include in our experimental evaluation can be implemented in a decentralized manner (e.g., (Giordani et al. 2010; Ismail and Sun 2017; Zavlanos et al. 2008; Bürger et al. 2012)), but that would typically require a larger amount of communication between the agents; in this case, the vehicles. As it turns out though, the ALMA algorithm of (Danassis et al. 2019), which has been designed with precisely these objectives in mind (low computational complexity, scalability, and low communication cost), performs very well across the board with respect to our objectives.
The third component of our CARs is the relocation of idle taxis. Relocation is an important component of a successful ridesharing application. Many studies in shared mobility systems have shown that the adoption of a relocation strategy can help improve the system performance for their specific context (Guériau and Dusparic 2018; Vosooghi et al. 2019; Martínez et al. 2017; Bélanger et al. 2016; Ruch et al. 2018; AlonsoMora et al. 2017a; Buchholz 2018; Lioris et al. 2016; Spieser et al. 2014; Tsao et al. 2019; van Engelen et al. 2018; Wen et al. 2017; Wallar et al. 2018). Strategies include using a short window of known active requests (AlonsoMora et al. 2017a), historical demand (Guériau and Dusparic 2018; AlonsoMora et al. 2017b; Fielbaum et al. 2021b; Zhou et al. 2013; Xue et al. 2015; van Engelen et al. 2018), or techniques to predict future demand (Spieser et al. 2016). Yet, relocation by nature increases vehicle travel distance, leading to undesirable consequences (economical, environmental, maintenance, management of human resources, etc.), thus a balance needs to be struck. Most of the employed relocation approaches are coursegrained; the network is generally divided into several zones, blocks, etc. (Guériau and Dusparic 2018; Vosooghi et al. 2019; Martínez et al. 2017) and the entities (e.g., the vehicles) move between the zones. However, compared to other shared mobility systems, dynamic ridesharing posses unique challenges, meaning that such coarsegrained approaches are not appropriate: most of them are centralized – thus computationally intensive and not scalable –, they might not take into account the actions of other vehicles, potentially leading to oversaturation of high demand areas, and, most importantly, they are slow to adapt to the highly dynamic nature of the problem (e.g., responding to high demand generated by a concert, or the fact that vehicles remain free for only a few minutes at a time). The problem clearly calls for finegrained solutions, yet such approaches in the literature are still rather scarce. In this paper, we employ such a finegrained relocation scheme (similarly to (AlonsoMora et al. 2017a)), based on matching between the idle taxis and the potential requests, which is better suited for the problem at hand.
Relocation can be either viewed as the kcenter or kFacility Location Problem (Guha and Khuller 1999), or as an Online Maximum Weight Matching problem on the history of requests. Given the high complexity of the former problems (they are both NPhard, in fact, APXhard (Hsu and Nemhauser 1979; Feder and Greene 1988)), we have opted for the latter interpretation.
3 Problem statement & modeling
In this section we formally present the Ridesharing problem. To avoid introducing unnecessary notation, we only present the description of the model here; precise notation and details are provided in the respective sections where they are used.
In the Ridesharing problem there is a (potentially infinite) metric space \({\mathcal {X}}\) representing the topology of the environment, equipped with a distance function \(\delta : {\mathcal {X}} \times {\mathcal {X}} \rightarrow \mathbb {R}_{\ge 0}\). Both are known in advance. At any moment, there is a (dynamic) set of available taxi vehicles \({\mathcal {V}}_t\), ready to serve customer requests (i.e., drive to the pickup, and subsequently to the destination location). Between serving requests, vehicles can relocate to locations of potentially higher demand, to mitigate spatial search frictions between drivers. Customer requests appear in an online manner at their respective pickup locations, wait to potentially be matched to a shared ride, and finally are served by a taxi to their respective destination. In order for two requests to be able to share a ride, they must satisfy spatial, and temporal constraints. The former dictates that requests should be matched only if there is good spatial overlap among their routes. Yet, due to the latter constraint, requests cannot be matched even if they have perfect spatial overlap, if they are not both ‘active’ at the same time. Finally, ridesharing is an inherently online problem, as we are unaware of the requests that will appear in the future, and need to make decisions before the requests expire, while taking into account the dynamics of the fleet of taxis.
3.1 Performance metrics
The goal is to minimize the cumulative distance driven by the fleet of taxis, while maintaining high Quality of Service (QoS), given that we serve all requests (service guarantee). Serving all requests improves passenger satisfaction, and, most importantly, allows us to ground our evaluation to a common scenario, ensuring a fair comparison.
3.1.1 Global metrics
Distance Driven: Minimize the cumulative distance driven by all vehicles for serving all the requests. We chose this objective as it directly correlates to passenger, company, and environmental objectives (minimize operational cost, delay, CO\(_2\) emissions, maximize the number of shared rides, improve QoS, etc.). All of the evaluated algorithms have to serve all the requests, either as shared, or single rides.
Complexity: Realworld time constraints dictate that the employed solution produces results in a reasonable timeframe^{Footnote 4}.
3.1.2 Passenger specific metrics—Quality of Service (QoS)
Time to Pair: Expected time to be paired in a shared ride, i.e., \(\mathbb {E}[t_{\text {paired}}  t_{\text {open}}]\), where \(t_{\text {open}}, t_{\text {paired}}\) denote the time the request appeared, and was paired as a shared ride, respectively. If the request is served as a single ride, then \(t_{\text {paired}}\) refers to the time the algorithm chose to serve it as such.
Time to Pair with Taxi: Expected time to be paired with a taxi, i.e., \(\mathbb {E}[t_{\text {taxi}}  t_{\text {paired}}]\), where \(t_{\text {taxi}}\) denotes the time the (shared) ride was paired with a taxi.
Time to Pickup: Expected time to passenger pickup, i.e., \(\mathbb {E}[t_{\text {pickup}}  t_{\text {taxi}}]\), where \(t_{\text {pickup}}\) denotes the time the request was pickedup.
Delay: Additional travel time over the expected direct travel time (when served as a single ride, instead of a shared ride), i.e., \(\mathbb {E}[(t_{\text {dest}}  t_{\text {pickup}})  (t'_{\text {dest}}  t_{\text {pickup}})]\). \(t_{\text {dest}}\), and \(t'_{\text {dest}}\) denote the time the request reaches, and would have reached as a single ride, its destination.
Research conducted by ridesharing companies shows that passengers’ satisfaction level remains sufficiently high as long as the pickup time is less than a certain threshold. The latter is corroborated by data on booking cancellation rate against pickup time (Tang et al. 2017). In other words, passengers would rather have a short pickup time and long detour, than viceversa (Brown 2016b). This also suggests that an effective relocation scheme can considerably improve passenger satisfaction by reducing the average pickup time (see Sect. 7.2.7).
Given the importance of short pickup times in passengers’ satisfaction, we opted to distinguish and study each segment of the waiting process separately (‘Time to Pair’, ‘Time to Pair with Taxi’, and ‘Time to Pickup’). To the best of our knowledge, we are the first to do so. Such analysis can provide a clear picture of sources of inefficiency to a ridesharing platform, and improve the overall satisfaction which in turn correlates to the growth of the company.
3.1.3 Platform specific metrics
Quality of Service (QoS): Refer to the aforementioned, passenger specific metrics^{Footnote 5}. Improving the QoS to their costumers correlates to the growth of the company.
Number of Shared Rides: Related to the profit. By carrying more than one passenger at a time, vehicles can serve more requests in a day, which consequently, increases the income (Widdows et al. 2017). The matching rate is important especially in the nascent stage of a ridesharing platform (Dutta and Sholley 2018).
Frictions: Waiting time experienced by drivers between serving requests (i.e., time between droppingoff a ride, and getting matched with another). Search frictions occur when drivers are unable to locate rides due to spatial supply and demand imbalance. Even though in our scenario matchings are performed automatically, without any searching involved by the drivers, lower frictions indicate a better distribution of the platform’s supply.
3.2 Modeling
Our evaluation setting is meticulously designed to resemble reality as closely as possible, in every aspect of the problem. We achieve this by using actual data from the NYC’s yellow taxi trip records^{Footnote 6} – both for modeling customer requests and taxis – and running our simulations to the scale of the actual problem faced by the ridesharing platforms (we run simulations with more than 390, 000 requests and 12, 000 taxis). Moreover, we have exhaustively designed every detail of the problem, such as speed of the vehicles, initial positions, distance function, etc. In what follows, we describe each design aspect in detail.
3.2.1 Dataset
We have used the yellow taxi trip records of 2016, provided by the NYC Taxi and Limousine Commission\(^6\). The dataset was cleaned to remove requests with travel time shorter than 1 minute, or invalid geolocations (e.g., outside Manhattan, Bronx, Staten Island, Brooklyn, or Queens). For every request, the dataset provides amongst others the pickup and dropoff times, and geolocation coordinates. Time is discrete, with granularity of 1 minute (same as the dataset). On average, there are 272 new requests per minute, totaling to 391479 requests in the broader NYC area (352455 in Manhattan) on the evaluated day (Jan, 15). Figure 2 depicts the number of request per minute on the aforementioned day.
3.2.2 Taxi vehicles
A unique feature of the NYC Yellow taxis is that they may only be hailed from the street and are not authorized to conduct prearranged pickups. This provides an ideal setting for a counterfactual analysis for several reasons: (1) We can assume a realistic position of each taxi at the beginning of the simulation (last dropoff location). (2) Doortodoor service can be inefficient (Fielbaum et al. 2021a; Stiglic et al. 2015), thus users may be requested to walk to/from a nearby fast street. Given that users have presumably hailed the taxis from larger streets, this results to a more accurate modeling of the origins of supply and demand. Finally, (3) all observed rides are obtained through search, thus – assuming reasonable prices, and delays – customers do not have nor are willing to take an alternative means of transportation. The latter validates our choice that all of the algorithms considered will have to eventually serve all the requests.
By law, there are 13, 587 taxis in NYC\(^12\). The majority of the results presented in this paper use a much lower number of vehicles (what we call base number) for three reasons: (1) to reduce the complexity of the problem, given that most of the employed algorithms can not handle such a large number of vehicles, (2) to evaluate under resource scarcity – making the problem harder – to better differentiate between the results, and (3) to investigate the possibility of a more efficient utilization of resources, with minimal cost to the consumers. However, we still present simulations for a wide range of vehicles, up to close to the total number.
The number, initial location, and speed of the taxi vehicles were calculated as follows:

We calculated the base number of taxis, as the minimum number of taxis required to serve all requests as single rides (no ridesharing). If a request appears, and all taxis are occupied serving other requests, we increase the required number of taxis by one. This resulted to around \(4000  5000\) vehicles (depending on the size of the simulation, see Sect. 7.2). Simulations were conducted for \(\{\times 0.5, \times 0.75, \times 1.0, \times 2.0, \times 3.0\}\) the base number.

Given a number of taxis, V, the initial position of each taxi is the dropoff location of the last V requests, prior to the starting time of the simulation. To avoid cold start, we compute the dropoff time of each request, and assume the vehicle occupied until then.

The vehicles’ average speed is estimated to 6.2 m/s (22.3 km/h), based on the trip distance and time per trip as reported in the dataset, and corroborated by the related literature (in (Santi et al. 2014) the speed was estimated at \(5.5  8.5\) m/s depending on the time of day).
3.2.3 Customer requests
A request, r, is a tuple \(\langle t_r, s_r, d_r, k_r \rangle\). Request r appears (becomes open) at its respective pickup time (\(t_r\)), and geolocation (\(s_r\)). Let \(d_r\) denote the destination. Each request admits a willingness to wait (\(k_r\)) to find a match (rideshare), i.e., we assume dynamic waiting periods per request. The rationale behind \(k_r\) is that requests with longer trips are more willing to wait to find a match than requests with destinations nearby. After \(k_r\) timesteps we call request r, critical. If a critical request is not matched, it has to be served as a single ride. Recall that in our setting all of the requests must be served. Let \({\mathcal {R}}_t^{\text {open}}, {\mathcal {R}}_t^{\text {critical}}\) denote the sets of open, and critical requests respectively, and let \({\mathcal {R}}_t = {\mathcal {R}}_t^{\text {open}} \cup {\mathcal {R}}_t^{\text {critical}}\).
We calculate \(k_r\) as in related literature (Danassis et al. 2019). Let \(w_{\text {min}}\), and \(w_{\text {max}}\) be the minimum and maximum possible waiting time, i.e., \(w_{\text {min}} \le k_r \le w_{\text {max}}, \forall r\). Knowing \(s_r, d_r\), we can compute the expected trip time (\(\mathbb {E}[t_{\text {trip}}]\)). Assuming people are willing to wait proportional to their trip time, let \(k_r = q \times \mathbb {E}[t_{\text {trip}}]\), where \(q \in [0, 1]\). \(w_{\text {min}}, w_{\text {max}}\), and q can be set by the ridesharing company, based on customer satisfaction (following (Danassis et al. 2019), let \(w_{\text {min}} = 1, w_{\text {max}} = 3\), and \(q = 0.1\)).
3.2.4 Rides
A (shared)ride, \(\rho\), is a pair \(\langle r_1, r_2 \rangle\), composed of two requests. If a request r is served as a single ride, then \(r_1 = r_2 = r\). Let \({\mathcal {P}}_t\) denote the set of rides waiting to be matched to a taxi at time t. Contrary to some recent literature on high capacity ridesharing (e.g., (AlonsoMora et al. 2017a; Lowalekar et al. 2019)), we purposefully restricted ourselves to rides of at most two requests for two reasons: complexity, and passenger satisfaction. The complexity of the problem grows rapidly as the number of potential matches increases, while most of the proposed/evaluated approaches already struggle to tackle matchings of size two on the scale of a realworld application. Moreover, even though a fully utilized vehicle would ultimately be a more efficient use of resources, it diminishes passenger satisfaction (a frequent worry being that the ride will become interminable, according to internal research by ridesharing companies) (Widdows et al. 2017; Brown 2016a). Given that a hard constraint is the serving of all requests, we do not assume a time limit on matching rides with taxis; instead we treat it as a QoS metric.
3.2.5 Distance function
The optimal choice for a distance function would be the actual driving distance. Yet, our simulations require trillions of distance calculations, which is not attainable. Given that the locations are given in latitude and longitude coordinates, it is tempting to use the Haversine formula^{Footnote 7} to estimate the Euclidean distance, as in related literature (Santos and Xavier 2013; Brown 2016a). We have opted to use the Manhattan distance, given that the simulation takes place mostly in Manhattan. To evaluate our choice, we collected more than 12 million actual driving distances using the Open Source Routing Machine (projectosrm.org), which computes the shortest path in road networks. Manhattan distance’s standard and mean squared error, compared to the actual driving distance, was \(0.5 \pm 2.9\) km, and \(1.7 \pm 2.4\) km respectively, while Euclidean distance’s was \(3.2 \pm 3.8\) km, and \(3.2 \pm 3.8\) km respectively.
3.2.6 Embedding into HSTs
A starting point of many of the employed kserver algorithms is embedding the input metric space \({\mathcal {X}}\) into a distribution \(\mu\) over \(\sigma\)hierarchically wellseparated trees (HSTs), with separation \(\sigma = \Theta (\log {\mathcal {X}} \log (k \log {\mathcal {X}}))\), where \({\mathcal {X}}\) denotes the number of points. It has been shown that solving the problem on HSTs suffices, as any finite metric space can be embedded into a probability distribution over HSTs with low distortion (Fakcharoenphol et al. 2003). The distortion is of order \({\mathcal {O}}(\sigma \log _\sigma {\mathcal {X}})\), and the resulting HSTs have depth \({\mathcal {O}}(\log _\sigma \Delta )\), where \(\Delta\) is the diameter of \({\mathcal {X}}\) (Bansal et al. 2015).
Given the popularity of the aforementioned method, it is worth examining the size of the resulting trees. Given that the geocoordinate system is a discrete metric space, we could directly embed it into HSTs. Yet, the size of the space is huge, thus for better discretization we have opted to generate the graph of the street network of NYC. To do so, we used data from openstreetmap.org. Similarly to (Santi et al. 2014), we filtered the streets selecting only primary, secondary, tertiary, residential, unclassified, road, and living street classes, using those as undirected edges and street intersections as nodes. The resulting graph for NYC contains 66543 nodes, and 95675 edges (5018, and 8086 for Manhattan). Given that graph, we generate the HSTs (Santi et al. 2014).
4 Component algorithms for ridesharing
In this section, we describe our design choices for developing Component Algorithms for Ridesharing (CARs). Each CAR is composed of three parts (Fig. 1): (a) request – request matching to create a (shared) ride, (b) ride to taxi matching, and (c) relocation of the idle fleet. Each of these components is a significant problem in its own right. Complexity issues make the simultaneous consideration of all three problems impractical. Instead, a more realistic approach is to tackle each component individually, under minimum consideration of the remaining two^{Footnote 8}. The algorithms that we consider are appropriate modifications of the most significant ones that have been proposed for the key algorithmic primitives of the ridesharing problem (see Sects. 1.1 and 2), i.e., online and offline matching algorithms, with or without delays for steps (a), (b), and (c), ktaxi/server algorithms for step (b), as well as heuristic approaches that were specifically designed with the ridesharing application in mind.
A list of all the CARs that we designed and evaluated (14 in total) can be found in Table 1, while in the following sections we provide a detailed description of each CAR component.
4.1 CAR components
We have evaluated a variety of approaches ranging from offline maximum weight matching (MWM), and greedy solutions, to online MWM, kTaxi/Server algorithms, and linear programming. Offline algorithms (e.g., MWM, ALMA, Greedy) can be run either in a justintime (JiT) manner – i.e., when a request becomes critical – or in batches, i.e., every x minutes (given that our dataset has granularity of 1 minute, we run in batches of 1, and 2 minutes).
Matching Graphs: At time t, let \({\mathcal {G}}_a = ({\mathcal {R}}_t, {\mathcal {E}}^a_t)\), where \({\mathcal {E}}^a_t\) denotes the weighted edges between requests. With a slight abuse of notation, let \(\delta (s_{r_1}, s_{r_2}, d_{r_1}, d_{r_2})\) denote the minimum distance required to serve both \(r_1\), and \(r_2\) (as a shared ride, i.e., excluding the case of first serving one of them and then the other) with a single taxi located either in \(s_1\), or \(s_2\). The weight \(w_{r_1, r_2}\) of an edge \((r_1, r_2) \in {\mathcal {E}}^a_t\) is defined as \(w_{r_1, r_2} = \delta (s_1, d_1) + \delta (s_2, d_2)  \delta (s_{r_1}, s_{r_2}, d_{r_1}, d_{r_2})\) (similarly to (Danassis et al. 2019; AlonsoMora et al. 2017a)). If \(r_1 = r_2\), let \(w_{r_1, r_2} = 0\) (single passenger ride). Intuitively, this number represents an approximation (given that it is impossible to know in advance the location of the taxi that will serve the ride) on the travel distance saved by matching requests \(r_1\), and \(r_2\)^{Footnote 9}.
Similarly, at time t, let \({\mathcal {G}}_b = ({\mathcal {V}}_t \cup {\mathcal {P}}_t, {\mathcal {E}}^b_t)\), where \({\mathcal {E}}^b_t\) denotes the weighted edges between rides and taxis. With a slight abuse of notation, let \(\delta (s_v, s_{r_1}, s_{r_2}, d_{r_1}, d_{r_2})\) denote the minimum distance required (out of all the possible pickup and dropoff combinations) to serve both requests \(r_1\), and \(r_2\) (that compose the (shared) ride \(\rho\)) with a single taxi located at \(s_v\). The weight \(w_{v, \rho }\) of an edge \((v, \rho ) \in {\mathcal {E}}^b_t\) is defined as \(w_{v, \rho } = 1 / \delta (s_v, s_{r_1}, s_{r_2}, d_{r_1}, d_{r_2})\). If \(r_1 = r_2\) (single passenger ride), let \(\delta (s_v, s_{r_1}, s_{r_2}, d_{r_1}, d_{r_2}) = \delta (s_v, s_{r_1}, d_{r_1})\). For the step (b) of the Ridesharing problem, we run the offline algorithms every time the set of rides (\({\mathcal {P}}_t\)) is not empty.
4.1.1 Maximum weight matching (MWM)
The maximum weight matching algorithm finds a matching with maximum total edge weight in a graph. We use a maximum wieght matching algorithm to

match requests into shared rides (step (a) of the Ridesharing problem), i.e., find a matching on \({\mathcal {G}}^a\) that maximizes the quantity \(\sum _{(r_1,r_2) \in {\mathcal {E}}^a_t} w_{r_1,r_2}\).

match rides with taxis (step (b) of the Ridesharing problem), i.e., find a matching on \({\mathcal {G}}_b\) that maximizes the quantity \(\sum _{(v, \rho ) \in {\mathcal {E}}^b_t} w_{v, \rho }\).
In both cases we use the wellknown blossom algorithm of Edmonds (1965). Not surprisingly, MWM results in high quality allocations, but that comes with an overhead in running time, compared to simpler, ‘local’ solutions (see Sect. 7.2). This is because blossom’s worstcase time complexity – on a graph (V, E) – is \({\mathcal {O}}(E V^2)\), and we have to run it three times, one for each step of the Ridesharing problem. Additionally, the MWM algorithm inherently requires a global view of the whole request set in a time window, and is therefore not a good candidate for the fast, decentralized solutions that are more appealing for reallife applications.
4.1.2 ALtruistic MAtching Heuristic (ALMA), (Danassis et al. 2019, 2022, 2021; Danassis 2022; Danassis and Faltings 2020)
ALMA is a recently proposed lightweight heuristic for weighted matching. A distinctive characteristic of ALMA is that agents (in our context: requests / rides) make decisions locally, based solely on their own utilities. In particular, while contesting for a resource (in our context: request / taxi), each agent will backoff with probability that depends on their own utility loss of switching to their next most preferred resource. E.g., for step (b) of the Ridesharing problem, suppose that for the agent representing ride \(\rho\), the next most preferred taxi to v is \(v'\), then \(loss = w_{v, \rho }  w_{v', \rho }\). The backoff probability (\(P(\cdot )\)) is computed individually and locally, based on Equation^{Footnote 10}1.
Intuitively, agents that do not have good alternatives will be less likely to backoff and vice versa. The algorithm is inherently decentralized, requires only a 1bit partial feedback from the resource (indicating whether the resource is free or not), and has constant in the total problem size running time, under reasonable assumptions on the preference domain of the agents. Thus, it is an ideal candidate for an ondevice solution. Moreover, in (Danassis et al. 2019) it was shown to achieve high quality results on a simpler version of step (a) of the Ridesharing problem, and in (Danassis et al. 2022) it was shown that it can be adapted to protect the privacy of the agents.
4.1.3 Greedy
Greedy is a very simple algorithm, which selects a node \(i \in V\) of a graph \(G=(V,E)\) uniformly at random, considers all the edges (i, j) with endpoint i, and matches i with a node \(j^{*}\) that is the endpoint of the edge with the largest weight among those, i.e., \((i,j^{*}) \in \arg \max (w_{i, j})\). Greedy approaches are appealing^{Footnote 11}, not only due to their low complexity, but also because realtime constraints dictate short planning windows which diminish the benefit of batch optimization solutions compared to myopic approaches (Widdows et al. 2017).
4.1.4 Approximation (Appr), (Bei and Zhang 2018)
Approximation (Appr) is a recentlyproposed offline algorithm due to Bei and Zhang (2018) which can be used to solve steps (a), and (b) of the Ridesharing problem. The algorithm takes a twophase approach which is also based on maximum weight matchings (or more accurately, the equivalent notion of minimum cost matchings), but on a set of different weights (to the ones we defined for the MWM algorithm). In particular:

First, it matches requests to shared rides using minimum cost matching based on the shortest distance to serve any request pair but on the worst pickup choice. Formally, the algorithm defines the quantities:
$$\begin{aligned} w_{ij}= & {} \min \left\{ \delta (s_1,s_2)+\delta (s_2,d_1)+\delta (d_1,d_2), \delta (s_1,s_2)+\delta (s_2,d_2)+\delta (d_2,d_1)\right\} \\ w_{ji}= & {} \min \left\{ \delta (s_2,s_1)+\delta (s_1,d_1)+\delta (d_1,d_2), \delta (s_2,s_1)+\delta (s_1,d_2)+\delta (d_2,d_1)\right\} \end{aligned}$$and then chooses \(w^1(i,j) = \max \{w_{ij},w_{ji}\}\). Intuitively, \(w_{ij}\) is the distance of the shortest path that picks up request \(r_1\) first (at its source location \(s_1\)), and similarly, \(w_{ji}\) is the distance of the shortest path that picks up request \(r_2\) first.

Then it matches rides to taxis using again minimum cost matching, and assuming the weight to be the distance of the closest pickup location of the two. Formally, let \(w^2(v,\langle r_i, r_j \rangle ) = \min \{\delta (s_v, s_i), \delta (s_v,s_j)\}\), where \(s_v\) is the position of taxi v, and compute a minimum cost matching in the bipartite graph defined by pairs \(\langle r_i, r_j \rangle\) matched in the previous step and taxis, with weights defined by \(u^2\).
Bei and Zhang (2018) prove a worstcase approximation guarantee of 2.5 for the algorithm.
4.1.5 Postponed greedy (PG), (Ashlagi et al. 2019)
Postponed Greedy (PG) is another very recently proposed, algorithm for the maximum weight online matching problem with deadlines (step (a) of the Ridesharing problem). The algorithm is online, meaning that it considers the potential requests that might appear in the future when making decisions about the present; its competitive ratio was proven to be 1/4 by Ashlagi et al. (2019). Contrary to our setting, the algorithm was designed for fixed deadlines, i.e., \(k_r = c, \forall r \in {\mathcal {R}}\).
The algorithm is best described in terms of an auction environment (Ashlagi et al. 2019) as follows. Let \(S_t\) and \(B_t\) be the sets of virtual sellers and virtual buyers at time t respectively. When a request r appears at time t, the algorithm creates a virtual seller \(s_r\) and a virtual buyer \(b_r\) for that request, and adds them to the aforementioned sets, i.e., \(S_t \leftarrow S_{t1} \cup \{s_r\}\) and \(B_t \leftarrow B_{t1} \cup \{b_r\}\). In other words, every request has two copies: a buyer and a seller. These are then placed in a virtual weighted bipartite graph \(G=(S_t,B_t,E_t)\), where the edge weights are defined in the same manner as the weights of \({\mathcal {G}}_a\) (see ‘Matching Graphs’ in Sect. 4.1). The algorithm proceeds to match the newly added buyer \(b_r\) with a seller \(s_{r^*}\) in a greedy manner, i.e., \((b_r,s_{r^*}) \in \underset{r' \in S_{t1}}{\arg \max }(w_{r, r'})\). This choice remains fixed for subsequent time steps. When the request r becomes critical (i.e., the deadline is about to be met), the ‘role’ of the request as either a seller or a buyer is conclusively chosen (uniformly at random). If r is a seller, and a subsequent buyer was matched with r, the match is finalized and is included in the output matching.
The major difference between the setting consider by Ashlagi et al. (2019) and our setting is that for us, requests become critical outoforder, and a critical request cannot be matched later. Thus, we apply the following modification: when a request becomes critical, if determined to be a seller, the match is finalized (if one has been found), otherwise the request is treated as a single ride.
4.1.6 Greedy dual (GD), (Bienkowski et al. 2018)
Greedy Dual is an online algorithm for solving the minimum cost (bipartite) perfect matching with delays, i.e., both steps (a), and (b) of the Ridesharing problem, which is based on the popular primaldual technique (Goemans and Williamson 1997). The weight (cost) of an edge in this setting includes arrival times as well, specifically:
where \(u_{\text {average}}\) is the average speed (see Sect. 3.2.2). The algorithm partitions all the requests into active sets, starting with the singleton \(\{r\}\) for a newly arrived request r. As is typical in the primaldual approach, at every timestep t these actives sets ‘grow’, until the weight of the edges of different active sets make the dual constraints of the problem tight (i.e., satisfied with equality). At this point the active sets merge, and the algorithm matches as many pairs of free requests in these sets as possible.
The algorithm has a competitive ratio of \({\mathcal {O}}({\mathcal {R}})\) and works with infinite metric spaces, potentially making the algorithm better suited for applications like the Ridesharing problem. Yet, in terms of our setup, it does not take into account the willingness to wait (\(k_r\)), thus missing matches of requests that became critical. Despite being designed for bipartite matchings as well, we opted out from using it for step (b) since it would require to create a new node every time a taxi vehicle dropsoff a ride and becomes available.
4.1.7 Balance (Bal), (Manasse et al. 1990)
Balance is a simple and classic algorithm for the kserver problem from the literature of competitive analysis. The rationale behind the algorithm is that it tries to balance out the distance traveled by taxis over the course of their operation, trying to maintain the workload as equal as possible. In particular, a ride is served by the taxi that has the minimum sum of the distance traveled so far plus its distance to the source of the ride (chosen uniformly at random between the sources of the two requests composing the ride). Specifically, ride \(\rho\) will be matched to taxi v:
where \(\text {driven}(v)\) denotes the distance driven by taxi v so far, and \(s_{\rho }\) is selected equiprobably among \(s_1\) and \(s_2\). The algorithm is minmax fair, i.e., it greedily minimizes the maximum accumulated distance among the taxis. The competitive ratio of the algorithm is \({\mathcal {X}}1\) in arbitrary metric spaces with \({\mathcal {X}}\) points (Manasse et al. 1990).
4.1.8 Harmonic (Har), (Raghavan and Snir 1989)
The Harmonic algorithm (Har) is another classic randomized algorithm from the kserver problem literature, which is simple and memoryless (i.e., it does not need to ‘remember’ the decisions that it took in previous steps). The algorithm matches a taxi with a ride with probability inversely proportional to the distance from its source (chosen uniformly at random between the sources of the two requests composing the ride). Specifically, ride \(\rho\) will be matched to taxi v with probability:
where \(s_{\rho }\) and \(s_{\rho '}\) are both selected equiprobably among \(s_1\), \(s_2\) and \(s_{1'}\), \(s_{2'}\), respectively. The tradeoff for its simplicity is the high competitive ratio, which is \({\mathcal {O}}(2^{{\mathcal {V}}} \log {\mathcal {V}})\) (Bartal and Grove 2000).
4.1.9 Double coverage (DC), (Chrobak et al. 1990)
Double Coverage (DC) is one of the two most famous kserver algorithms in the literature. The algorithm is designed to run on a specific type of metric space called an HST (Hierarchical Separated Tree, see Sect. 3.2.6). For a general metric spaces \({\mathcal {X}}\), the algorithm can be applied by first embedding \({\mathcal {X}}\) to an HST (a process which is referred to as an ‘HST embedding’). This process ‘simulates’ the general space \({\mathcal {X}}\) by an HST, in the sense that the HST approximately captures the properties of the original space \({\mathcal {X}}\). The points of \({\mathcal {X}}\) are the leaves of the HST.
Given an HST, the algorithm works as follows. To determine which taxi will serve a ride, all unobstructed taxis move towards its source, i.e., a leaf of the HST (chosen randomly between the sources of the two requests sharing the ride) with equal speed. Initially, all taxis are unobstructed. During this movement process, a taxi becomes obstructed when its path from its current location to the leaf corresponding to the ride is ‘blocked’ by another taxi, meaning that it would have to move through the same position in the tree that another taxi has already been at, to reach the leaf. In this case, the taxi stops (as the ‘blocking’ taxi is closer to serving the ride), while the remaining taxis keep moving as before. When some taxi reaches the leaf corresponding to the ride, the process stops, and each taxi maintains its current position on the HST.
To implement the algorithm, we first appropriately discretize our metric space and then perform the HST embedding as described in (Bartal 1996; Fakcharoenphol et al. 2004) (see Sect. 3.2.6 for more details). Given that only leaves correspond to locations on \({\mathcal {X}}\), we chose to implement the lazy version of the algorithm (which is worstcase equivalent to the original definition e.g., see (Koutsoupias 2009)), i.e., only the taxi serving the ride will move on \({\mathcal {X}}\); one can envision a process in which the taxis ‘virtually’ move as described above, but once the ride has been served, all taxis are restored to their original positions. This is also on par with the main goal of minimizing the distance driven. The algorithm is kcompetitive on all tree metrics (Chrobak and Larmore 1991a).
4.1.10 Work function (WFA), (Chrobak and Larmore 1991b; Koutsoupias and Papadimitriou 1995)
The Work Function algorithm (WFA) is perhaps the most important kserver algorithm, as it provides the best competitive ratio to date, due to the celebrated result of (Koutsoupias and Papadimitriou 1995). Intuitively, to decide which taxi (or server) will be the one to serve a ride that just appeared at time t, and, more generally, the movement of the other taxis, the algorithm:

computes the (offline) optimal solution until time \(t  1\), meaning the best possible allocation of rides to taxis using the information from the beginning of the algorithm until the appearance of the ride at time t,

computes a greedy cost for switching between configurations,

chooses the new taxi positions that minimize the sum of the two aforementioned costs.
More formally, let \(L^t = (l^t_1, l^t_2, \dots , l^t_{{\mathcal {V}}})\) denote the configuration of the fleet of taxis \({\mathcal {V}}\) at timestep t, i.e., a vector of taxi locations, where \(l^t_v\) specifies the location of taxi v. Let \(\text {OPT}_t(L)\) be the optimal (total distanceminimizing) way of serving rides that appear at times 1 through t, such that the taxis end up at configuration L. To choose configuration \(L^{t}\), it uses the following rule:
The WFA serves ride \(\rho _t\) at timestep t by switching from the current taxi configuration \(L^{t1}\), to a new configuration \(L^{t}\). Specifically, it selects \(L^{t}\) which minimizes (a) the minimum total cost of starting from \(L^{0}\), serving in turn \(\rho _0, \rho _1, \dots , \rho _{t1}\), and ending up in \(L^{t}\), plus (b) the distance traveled by a taxi to move from its position in \(L^{t1}\) to that in \(L^{t}\).
An obvious obstacle that makes the algorithm intractable in practice is that the complexity increases from step to step, resulting in computation and/or memory issues. To circumvent this obstacle, we implemented an efficient variant using network flows, as described in (Rudec et al. 2013). Yet, as the authors of (Rudec et al. 2013) state as well, the only practical way of using the WFA is switching to its window version wWFA, where we only optimize for the last w rides. Even though the complexity of wWFA does not change between timesteps, it does change with the number of taxis. The resulting network has \(2{\mathcal {P}} + 2{\mathcal {V}} + 2\) nodes, and we have to run the BellmanFord algorithm (Bellman 1958) at least once to compute the potential of nodes and make the costs positive (BellmanFord runs in \({\mathcal {O}}({\mathcal {P}}{\mathcal {V}})\). We refer the reader to (Bertsekas 1998) for more details on network optimization. As before, the source of the ride is chosen randomly between the sources of the two requests composing the ride.
4.1.11 kTaxi, (Coester and Koutsoupias 2019)
This is a very recent algorithm for the ktaxi problem, which provides the best possible competitive ratio. The algorithm operates on HSTs, where the rides and taxis at any time are placed at its leaves. First, it generates a Steiner tree that spans the leaves that have taxis or rides, and then uses this tree to schedule rides, by simulating an electrical circuit. In particular, whenever a ride appears at a leaf, the algorithm interprets the edges of the tree with length R as resistors with resistance R, which determine the fraction of the current flow that will be routed from the node corresponding to the taxi towards the ride. These fractions are then interpreted as probabilities which determine which taxi will be chosen to pick up the ride.
4.1.12 High capacity (HC), (AlonsoMora et al. 2017a)
This algorithm comes from a highlycited paper, and is the only one in our evaluated approaches that addresses vehicle relocation (step (c)). Contrary to our approach, it tackles steps (a), and (b) simultaneously, leaving step (c) as a separate subproblem. The algorithm consists of five steps:

(i)
Computing a pairwise requestvehicle shareability graph (RVgraph) (Santi et al. 2014). The RVgraph represents which requests and vehicles might be pairwiseshared, with edges connecting all possible requests to pair and all possible vehicles to serve a request.

(ii)
Computing a graph consisting of feasible (candidate) trips and the set of vehicles that can execute them (RTVgraph). This is a tripartite graph with edges connecting requests to trips (a request is connected to a trip if it is part of it), and edges connecting trips to vehicles (an edge between vehicle and a trip exists if the vehicle is able to serve it).

(iii)
Computing a greedy solution for the RTVgraph. In this step, rides are assigned to vehicles iteratively in decreasing size of the trip (in our case, we first assign shared rides (two requests), and then single rides) and increasing cost (e.g., delay).

(iv)
Solving an ILP to compute the best assignment of vehicles to trips, using the previously computed greedy solution as an initial solution.

(v)
(optional) Rebalancing of free vehicles. If there remain any unassigned requests, it solves an ILP to optimally assign them to idle vehicles based on travel times.
We use CPLEX (Bliek et al. 2014) to solve the ILPs.
4.1.13 Baseline: single ride
Uses MWM to schedule the serving of single rides to taxis (there is no ridesharing, i.e., we omit step (a) of the Ridesharing problem).
4.1.14 Baseline: random
Makes random matches, provided that the edge weight is nonnegative.
While our evaluation contains many recently proposed algorithms for matching, the observant reader might notice that, with the exception of ktaxi, our kserver algorithms are from the classical literature. We did consider more recent kserver algorithms (e.g., (Dehghani et al. 2017; Lee 2018; Bansal et al. 2015)), but their complexity turns out to be prohibitive. This is mainly because they proceed via an ‘online rounding’ of an LPrelaxation of the problem, which maintains a variable for every (timestep, point in the metric space) pair. Even for one hour (3600 timesteps) and our discretization of Manhattan (5018 nodes), we need more than 18 million variables (230 million for NYC).
5 Scalability challenges
To highlight the challenges in the design of CARs, we will be referring to our evaluation setting (see Sect. 3.2), which accurately models a realworld application, in terms of both scale and detail. Let \({\mathcal {V}}\), \({\mathcal {R}}\) denote the set of vehicles / requests, respectively. Recall that in our setting, which involves real data from NYC taxi records, there are 272 new requests per minute on average, totaling to 391479 requests in the broader NYC area (352455 in Manhattan) on the evaluated day (Jan, 15, 2006). By law, there are 13, 587 taxis in NYC^{Footnote 12}.
5.1 ILP approaches
A natural approach would be to try to use Integer Linear Programs (ILPs) for matching passengers to other passengers or rides, under spatial and temporal constraints, similarly to the High Capacity algorithm of (AlonsoMora et al. 2017a) (which can be seen as a CAR with steps (a) and (b) intertwined). As is commonly the case with ILPs, the problem is scalability; the number of variables can be as large as \({\mathcal {O}}({\mathcal {V}} {\mathcal {R}}^2)\) – which results in 27  216 million variables, given that every timestep we have approximately 300  600 requests, and as many taxis – and the number of constraints is \({\mathcal {V}} + {\mathcal {R}}\). This makes ILP approaches prohibitive as components in CARs. The latter make hard to even compute the initial greedy solution in realtime. AlonsoMora et al. circumvent this issue by enforcing delay constraints, specifically they ignore requests that are not matched to any vehicle within a maximum waiting time. This is not possible in our model since we have to serve all requests (service guarantee).^{Footnote 13}
5.2 MWM approaches
Given that all three parts of the ridesharing problem can be viewed as matching problems, a natural approach would be to run maximumweight matching (MWM) in batches (e.g., (Bei and Zhang 2018)), meaning that we serve the requests that have accumulated over a prespecified time window. The MWM problem can be solved via the classic blossom algorithm (Edmonds 1965) with run time – on a graph (V, E) – of \({\mathcal {O}}(E V^2)\).
5.3 kserver/taxi algorithms
Many of these algorithms operate by embedding the input metric space \({\mathcal {X}}\) into a distribution \(\mu\) over Hierarchical Separated Trees (HSTs) (e.g., the classic doublecoverage (Chrobak et al. 1990)), and thus to apply them in practice, it is necessary to examine the size of these trees. Given that the geocoordinate system is a discrete metric space, we could directly embed it into HSTs. Yet, the size of the space is huge, and hence for better discretization we have opted to generate the graph of the street network of NYC (see Sect. 3.2.6). The resulting graph for NYC contains 66543 nodes, and 95675 edges (5018, and 8086 for Manhattan). Here, there is an obvious interplay between the accuracy of the embedding and the algorithm’s complexity.
More recent kserver algorithms (e.g., (Dehghani et al. 2017; Lee 2018; Bansal et al. 2015)) use sophisticated ‘online rounding’ techniques; these however require maintaining variables for every (timestep, point in the metric space) pair, which makes them prohibitive for any largescale realworld application; even for one hour (3600 timesteps) and our discretization of Manhattan (5018 nodes), we would need more than 18 million variables (230 million for NYC).
5.4 Observability
Most approaches are centralized, and require a global view of the entire window, which is hard to scale. As autonomous agents proliferate, a practical and applicable CAR must be distributed and ideally run ondevice.
6 Vehicle relocation challenges
There are two ways to enforce relocation: passive, and active. Ridesharing platforms, like Uber and Lyft, have implemented marketdriven pricing as a passive form of relocation. Counterfactual analysis performed in (Buchholz 2018) shows that implementing pricing rules can result in daily net surplus gains of up to 232000 and 93000 additional daily taxipassenger matches. While the gains are substantial, the market might be slow to adapt, and drivers and passengers do not always follow equilibrium policies. Contrary to that, our approach is active, in the sense that we directly enforce relocation. Moreover, we adopt a more anthropocentric approach: in our setting, the demand is fixed, thus the goal is not to increase revenue as a result of serving more rides, but rather to improve the QoS^{Footnote 14}.
There are many ways to approach dynamic relocation. Most of the employed relocation approaches are coursegrained; the network is generally divided into several zones, blocks, etc. (Guériau and Dusparic 2018; Vosooghi et al. 2019; Martínez et al. 2017) and the entities (e.g., the vehicles) move between the zones. However, compared to other shared mobility systems, dynamic ridesharing posses unique challenges, meaning that such coarsegrained approaches are not appropriate^{Footnote 15}: most of them are centralized – thus computationally intensive and not scalable –, they might not take into account the actions of other vehicles, potentially leading to oversaturation of high demand areas, and, most importantly, they are slow to adapt to the highly dynamic nature of the problem (e.g., responding to high demand generated by a concert, or the fact that vehicles remain free for only a few minutes at a time). The problem clearly calls for finegrained solutions, yet such approaches in the literature are still rather scarce. High Capacity (HC) employs finegrained relocation. HC solves an ILP, which could reach high quality results, but it is not scalable nor practical. Ideally, we would like a solution that can run ondevice. The kserver algorithms perform an implicit relocation, yet they are primarily developed for adversarial scenarios, and do not utilize the plethora of historic data^{Footnote 16}. In reality, requests follow patterns that emerge due to human habituality (e.g., during the first half of the day in Manhattan, there are many more dropoffs in Midtown compared to pickups (Buchholz 2018)).
6.1 Patterns in customer requests
To confirm the existence of transportation patterns, we performed the following analysis: For each request r on January 15^{Footnote 17}, we searched the past three days for requests \(r'\) such that \(t_{r}  t_{r'} \le 10\), \(\delta (s_{r}, s_{r'}) \le 250\), and \(\delta (d_{r}, d_{r'}) \le 250\). The results are depicted in Fig. 3. On average, \(13.3\%\) of the trips are repeated across all three previous days, peaking at \(43.7\%\) on rush hours (e.g., 68 in the morning). Note that predicting transport demand based on historic data is not an easy task; \(13.3\%\) is about 47000 trips, which is rather significant in raw numbers.
6.2 Relocation matching graph
Given the high density of the requests, and the low frictions of the taxis (i.e., taxis remain free for relocation only for a short time window), we opted for a simple, finegrained, matching approach. We use the history to predict a set of expected future requests. Specifically, let D, and T be the sampling windows, in days and minutes respectively (we used \(D = 3\), and \(T = 2\)). Let t denote the current timestep. The set of past requests on our sampling window is \({\mathcal {R}}_{\text {past}} = \{r: t_{r}  t \le T\}\), as long as r appeared at most D number of days prior to t. The set of expected future requests \({\mathcal {R}}_{\text {future}}\) is generated by sampling from \({\mathcal {R}}_{\text {past}}\). Relocation is performed in a justintime manner, every time the set of idle vehicles is not empty. We generate similar matching graphs as in Sect. 4.1, and then we proceed to match requests to shared rides, and rides to idle taxis. The difference being that now the set of nodes of \({\mathcal {G}}_a\) is \({\mathcal {R}}_{\text {future}} \cup {\mathcal {R}}_{t}\). Finally, each idle taxi starts moving towards the source of its match (given that these are expected rides, the source is picked at random between the sources of the two requests composing the ride).
7 Evaluation
7.1 Employed CARs
Evaluating all of the possible combinations of CAR components is infeasible. To make the evaluation tractable, we first consider only the first two steps of the ridesharing problem (i.e., no relocation). When possible, we use the same component for both steps (a) and (b). kTaxi/Server algorithms, though, can not solve step (a), thus we opted to use the best performing component for step (a) (namely the offline maximumweight matching (MWM) run in batches). Then, we move to evaluate step (c), testing only the most promising components (namely the MWM and ALMA, plus the Greedy as a baseline). We begin by isolating step (c); we fix the component for (a) and (b) to MWM, to have a commonground for evaluating relocation. Finally, we present results on endtoend solutions. A list of all the evaluated CARs can be found in Table 1, while Table 2 contains a summary of all the evaluated metrics.
7.2 Simulation results
In this section we present the results of our evaluation. For every metric we report the average value out of 8 runs. In what follows we shortly detail only the most relevant results. Please refer to Appendix A for the complete results including larger testcases on the broader NYC area and omitted metrics, standard deviation values, algorithms (e.g., WFA, and HC had to be evaluated in smaller testcases), etc.
Figures 4, 7, 5, and 6 present the results without relocation. We first present results on one hour (Figs. 4 and 7) and base number of taxis (see Sect. 3.2.2). Then, we show that the results are robust at a larger timescale^{Footnote 18} (Figure 5), and varying number of vehicles^{Footnote 19} (2138  12828) (Fig. 6). Finally, we present results on the step (c) of the Ridesharing problem: dynamic relocation (Table 3, Fig. 8).
7.2.1 Distance driven
In the small testcase (Fig. 4a) MWM performs the best, followed by Bal (\(+7\%\)). ALMA comes third (\(+19\%\)), and then Greedy (\(+21\%\)). The high performance of Bal in this metric is because it uses MWM for step (a), which has a more significant impact on the distance driven. Similar results are observed for the whole day (Fig. 5a), with Bal, ALMA, and Greedy achieving \(+4\%\), \(+18\%\), and \(+22\%\) compared to MWM, respectively. Figure 6a shows that as we decrease the number of taxis, Bal loses its advantage, Greedy is pulling away from ALMA (\(9\%\) worse than ALMA), while ALMA closes the gap to MWM (\(+17\%\)).
7.2.2 Complexity
To estimate the complexity, we measured the elapsed time of each algorithm. Greedy is the fastest one (Fig. 4b), closely followed by Har, Bal, and ALMA. ALMA is inherently decentralized. The red overlay denotes the parallel time for ALMA, which is 2.5 orders of magnitude faster than Greedy.
7.2.3 Time to pickup
MWM exhibits exceptionally low time to pickup (Fig. 4c), lower than the single ride baseline. ALMA, Greedy, and Bal have \(+69\%\), \(+76\%\), and \(+33\%\) compared to MWM, respectively. As before, Fig. 6b shows that as we decrease the number of taxis, Bal loses its advantage, and Greedy is pulling further away from ALMA. Note that to improve visualization, we removed DC’s pickup time as it was one order of magnitude larger than Appr.
7.2.4 Delay
PG exhibits the lowest delay (Fig. 4d), but this is because it makes \(26\%\) fewer shared rides than the rest of the high performing algorithms. ALMA has the smallest delay (\(13\%\) compared to MWM), with Greedy following at \(1\%\), while Bal has \(+63\%\) (both compared to MWM). As the number of taxis decrease (Fig. 6c), ALMA’s gains increase further (\(22\%\) compared to MWM).
Figures 5d, and 6d depict the cumulative delay, which is the sum of all delays described in Sect. 3.1.2, namely the time to pair, time to pair with taxi, time to pickup, and delay. An interesting observation is that reducing the fleet size from 12828 (\(\times 3.0\) of the base number) to just 3207 (\(\times 0.75\) of the base number) vehicles (\(75\%\) reduction) results in only approximately 2 minutes of additional delay (Fig. 6d). This goes to show the great potential for efficiency gains such technologies have to offer.
Finally, we wanted to investigate the distribution of the achieved QoS metrics and, consequently, the reliability/fairness of each CAR. As such, we plotted in Fig. 7a the sequence of percentiles^{Footnote 20} for the cumulative delay. As shown, the vast majority of the users (\(75\%\)) experience cumulative delay close to the average value (only 46, 85, 92, 69 additional seconds of cumulative delay than the average value for MWM, ALMA, Greedy, and BAL, respectively). Of course, some of the users experiences high cumulative delay, but this is a small percentage of them. Specifically, less than \(5\%\) of requests experience a delay of more than 8.5, 13, 13, and 9.5 minutes for MWM, ALMA, Greedy, and BAL, respectively. Given the size and the average speed of taxi vehicles in Manhattan, such delays could be expected and, thus, acceptable; ultimately, it is up to the ridesharing platform to impose hard constraints and reject requests with potentially high delay.
7.2.5 Frictions
Figure 5b shows the driver frictions. In this metric, kserver algorithms seem to outperform matching algorithms by far. Compared to MWM, Bal and Har achieve a \(63\%\) and \(73\%\) decrease, respectively, while ALMA and Greedy achieve a \(26\%\), and \(21\%\) decrease, respectively. Given that we have a fixed supply, lower frictions indicate a more even distribution of rides amongst taxis.
It is important to note that while the results for all the other metrics are consistent when moving from the one hour testcase to the full day testcase, this is not true for the frictions (see Figs. 9i and 12i and Tables 6 and 10 in the Appendix). This is because taxis that serve zero or one rides are assumed to have zero friction by definition. Algorithms like Bal – which attempts to balance the distance driven by each taxi – will utilize each vehicle multiple times, even for the short time window of one hour. This results to a deceivingly high number in the frictions in the one hour testcase. As a matter of fact, the number of taxis that served less than two rides (and, thus, had zero friction) in the one hour testcase for Bal were 483. For MWM this number is 1368 (almost 3 times larger), for ALMA it is 1181, and for Greedy 1120. This is why we opted to present the frictions for the full day testcase in Fig. 5b.
7.2.6 Time to pair with taxi & number of shared rides
Excluding the testcase with the smallest taxi fleet (\(\times 0.5\) the base number), the time to pair with taxi was zero, or close to zero, for all the evaluated algorithms. The latter comes to show the potential for efficiency gains and better utilization of resources using smart technologies. The reason for the low time to pair with a taxi is that, for the step (b) of the ridesharing problem (matching (shared) rides to taxis), we run the offline algorithms in a justintime (JiT) manner, i.e., every time the set of rides (\({\mathcal {P}}_t\)) is not empty (see Sect. 4.1). We opted to do so for simplicity – the alternative would require to run all combinations of batch sizes for both steps (a) and (b). Results from step (a), though, suggest that running in batches is more beneficial (running in batch size of two minutes consistently outperformed the JiT version, see Appendix A). There is a clear tradeoff: match with a taxi as soon as possible (JiT), and have a vehicle moving to pickup the ride earlier, or wait (match in batches every x minutes), potentially allowing for better matches? Answering this question remains open for future work.
The number of shared rides is approximately the same for all the employed algorithms, with notable exception the PG which makes \(26\%\) fewer shared rides.
7.2.7 Relocation
The aim of any relocation strategy is to improve the spatial allocation of supply. Serving requests redistributes the taxis, resulting in an inefficient allocation. One can assume a ‘lazy’ approach, relocating vehicles only to serve requests. While this minimizes the cost of serving a request (e.g., distance driven, fuel, etc.), it results in suboptimal QoS. Improving the QoS (especially the time to pickup, since it highly correlates to passenger satisfaction, see Sect. 3.1.2) plays a vital role in the growth of a company. Thus, a crucial tradeoff of any relocation scheme is improving the QoS metrics, while minimizing the excess distance driven.
CARs with relocation successfully balance this tradeoff (Table 3). In particular, ALMA – the best performing overall – radically improves the QoS metrics by more than \(50\%\) (e.g., it decreases the pickup time by \(55\%\), and its standard deviation (SD) by \(58\%\)), while increasing the driving distance by only \(6\%\). The cumulative delay is decreased by \(43\%\).
As a final step, we evaluate endtoend solutions, using MWM, ALMA, and Greedy to solve all three steps of the ridesharing problem. Figure 8 depicts the time to pickup (error bars denote one SD of uncertainty), a metric highly correlated to passenger satisfaction level (Tang et al. 2017; Brown 2016b). We compare against the single ride baseline (no delay due to sharing a ride, see Sect. 4.1.13). Once more, the proposed relocation scheme results in radical improvements, as the time to pickup drops (compared to the single ride) from \(+14.09\%\) to \(41.76\%\) for MWM, from \(+74.14\%\) to \(9.33\%\) for ALMA, and from \(+86.10\%\) to \(7.97\%\) for Greedy. This comes to show that simple relocation schemes can eliminate the negative effects of ridesharing on the QoS.
7.2.8 ALMA as an endtoend CAR
While MWM seems to perform the best in the total distance driven, and most QoS metrics – which is reasonable since it makes optimal matches amongst passengers – it hard to scale and requires a centralized solution. In contrast, greedy approaches are appealing\(^11\) not only due to their low complexity, but also because realtime constraints dictate short planning windows which can diminish the benefit of batch optimization solutions compared to myopic approaches (Widdows et al. 2017).
In fact, ALMA is of a greedy nature as well, albeit it utilizes a more intelligent backingoff scheme, thus there are scenarios where ALMA significantly outperforms the greedy, as proven by the simulation results. For example, in more challenging scenarios (smaller taxi fleet, or potentially different types of taxis) the smarter back off mechanism results in a more profound difference.
Most importantly, ALMA was inherently developed for multiagent applications. Agents make decisions locally, using completely uncoupled learning rules, and require only a 1bit partial feedback (Danassis et al. 2019), making it an ideal candidate for an ondevice implementation. This is fundamentally different than a decentralized implementation of the Greedy algorithm for example. Even in decentralized algorithms, the number of communication rounds required grows with the size of the problem. However, in practice the realtime constraints impose a limit on the number of rounds, and thus on the size of the problem that can be solved within them.
7.3 Highlevel analysis
Applying the modular approach we advocate, allowed us to thoroughly test a wide variety of stateoftheart algorithms for ridesharing. When dealing with a multiobjective optimization problem, it is unreasonable to expect to identify an approach that outperforms the competition across the board. Nevertheless, our findings provide convincing evidence to a ridesharing platform as to which CARs would be most suitable for a given set of objectives. Specifically: (i) CARs that rely on offline (inbatches) maximumweight matching solutions perform well on global efficiency and passenger related metrics, (ii) CARs based on kserver algorithms perform well on platform related metrics (e.g., Bal), (iii) lightweight CARs perform better in realworld, largescale settings due to short planning windows imposed by the requirement to run in realtime, (iv) a simple, finegrained relocation scheme based on the history of requests can significantly improve Quality of Service metrics by up to \(50\%\), and finally, (v) we identify a scalable, ondevice CAR based on ALMA that performs well across the board. A summary of the results can be found in Table 4.
8 Conclusion
Managing transportation resources on a large scale remains a critical open problem. We initiate the systematic study of Component Algorithms for Ridesharing (CARs), a modular design methodology for ridesharing. To gain insight into the intricate dynamics of the problem, it is highly important to evaluate a diverse set of candidate solutions in settings designed to closely resemble reality. We evaluate a diverse set of candidate CARs (14 in total) – focused on the key algorithmic components of ridesharing – over 10 metrics, in settings designed to closely resemble reality in every aspect of the problem. To the best of our knowledge, this is the first endtoend evaluation of this magnitude. We show the capacity of simple relocation schemes to improve QoS metrics radically, eliminating the negative effects of ridesharing, and identify an ALMAbased CAR that offers an efficient (across all metrics), scalable, ondevice, endtoend solution.
Notes
Throughout the paper, we use the term ‘ridesharing’ to refer to passengers (potentially) using the same vehicle at the same time, also referred to as ‘ridepooling’ (Shaheen and Cohen 2019).
In this paper we do not explicitly model the drivers’ incentives. This is common in scenarios where vehicles are autonomous, or the drivers are hired on a perhour basis by the platform.
In fact the latter two problems are quite closely connected, and algorithms for the kserver problem can be used to solve the ktaxi problem. See (Coester and Koutsoupias 2019) for more details.
For example UberPool has a waiting period of at most 2 minutes until you get a match (https://www.uber.com/au/en/ride/uberpool/), thus any algorithm has to run in under that time to be applicable in real life.
We do not report separate values on the platform’s QoS metrics.
It also ensures that the shared ride will cost less than the single ride option.
The parameter \(\epsilon\) places a threshold on the minimum / maximum backoff probability.
For the sake of completeness we have evaluated the High Capacity algorithm on much smaller test cases; see Appendix A.
Decreased delays can also in turn improve revenue by serving more requests in a fixed time window.
As a matter of fact, we tried zone based relocation (generating zones based on historical data using the OPTICS clustering algorithm (Xu and Tian 2015), or using predefined clusters based on population density according to the NYC census data (https://guides.newman.baruch.cuny.edu/nyc_data/nbhoods)). Due to the vast number of requests, the only discernible clusters were of large regions (Manhattan, Bronx, Staten Island, Brooklyn, or Queens), which does not allow for finegrained relocation. As a result, we achieved significantly inferior results.
NYC TLC has been proving data on yellow taxi trips since 2009.
January 15, 2016 was selected as a representative date for our simulations since it is not a holiday, and it is a Friday thus sampling for past requests results in a representative pattern (contrary to sampling on a weekend for example).
Missing components were too computationally expensive to simulate for an entire day.
We only present the most promising solutions.
Given a vector V of cumulative delays per request, the qth percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V.
References
Agatz N, Erera AL, Savelsbergh MW, Wang X (2011) Dynamic ridesharing: a simulation study in metro atlanta. Procedia Soc Behav Sci 17:532–550
Agatz N, Erera A, Savelsbergh M, Wang X (2012) Optimization for dynamic ridesharing: a review. Eur J Oper Res 223(2):295–303
AlonsoGonzález MJ, van Oort N, Cats O, HoogendoornLanser S, Hoogendoorn S (2020) Value of time and reliability for urban pooled ondemand services. Transp Res Part C: Emerg Technol 115:102621
AlonsoMora J, Samaranayake S, Wallar A, Frazzoli E, Rus D (2017a) Ondemand highcapacity ridesharing via dynamic tripvehicle assignment. Proceedings of the National Academy of Sciences
AlonsoMora J, Wallar A, Rus D (2017b) Predictive routing for autonomous mobilityondemand systems with ridesharing. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 3583–3590, 10.1109/IROS.2017.8206203
Asadpour A, Lobel I, van Ryzin G (2020) Minimum earnings regulation and the stability of marketplaces. In: Proceedings of the 21st ACM Conference on Economics and Computation, ACM, EC ’20
Ashlagi I, Azar Y, Charikar M, Chiplunkar A, Geri O, Kaplan H, Makhijani R, Wang Y, Wattenhofer R (2017) Mincost bipartite perfect matching with delays. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017), Schloss DagstuhlLeibnizZentrum fuer Informatik
Ashlagi I, Burq M, Dutta C, Jaillet P, Saberi A, Sholley C (2019) Edge weighted online windowed matching. In: Proceedings of the 2019 ACM Conference on Economics and Computation, ACM, EC ’19
Banerjee S, Freund D, Lykouris T (2017) Pricing and optimization in shared vehicle systems: An approximation framework. In: Proceedings of the 2017 ACM Conference on Economics and Computation, ACM
Bansal N, Buchbinder N, Madry A, Naor J (2015) A polylogarithmiccompetitive algorithm for the kserver problem. J ACM 62(5):1–49
Bartal Y (1996) Probabilistic approximation of metric spaces and its algorithmic applications. In: Proc. of 37th Conference on Foundations of Computer Science, IEEE
Bartal Y, Grove E (2000) The harmonic kserver algorithm is competitive. Journal of the ACM (JACM)
Bathla K, Raychoudhury V, Saxena D, Kshemkalyani AD (2018) Realtime distributed taxi ride sharing. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE, pp 2044–2051
Bei X, Zhang S (2018) Algorithms for tripvehicle assignment in ridesharing. In: ThirtySecond AAAI
Bélanger V, Kergosien Y, Ruiz A, Soriano P (2016) An empirical comparison of relocation strategies in realtime ambulance fleet management. Comput Ind Eng 94:216–229
Bellman R (1958) On a routing problem. Q Appl Math 16(1):87–90
Bertsekas DP (1998) Network optimization continuous and discrete models. Athena Scientific Belmont
Bienkowski M, Kraska A, Liu HH, Schmidt P (2018) A primaldual online deterministic algorithm for matching with delays. In: International Workshop on Approximation and Online Algorithms, Springer
Bliek C, Bonami P, Lodi A (2014) Solving mixedinteger quadratic programming problems with ibmcplex: a progress report. In: Proceedings of the twentysixth RAMP symposium, pp 16–17
Borodin A, ElYaniv R (2005) Online computation and competitive analysis. Cambridge University Press, Cambridge
Brown T (2016) Matchmaking in lyft line — part 1. eng.lyft.com/matchmakinginlyftline9c2635fe62c4
Brown T (2016) Matchmaking in lyft line — part 2. eng.lyft.com/matchmakinginlyftline691a1a32a008
Buchbinder N, Coester C, Joseph, Naor (2020) Online \(k\)taxi via double coverage and timereverse primaldual. arXiv:2012.02226
Buchholz N (2018) Spatial equilibrium, search frictions and dynamic efficiency in the taxi industry. Tech. rep., mimeo, Princeton University
Bürger M, Notarstefano G, Bullo F, Allgöwer F (2012) A distributed simplex algorithm for degenerate linear programs and multiagent assignments. Automatica
Chen M, Shen W, Tang P, Zuo S (2019) Dispatching through pricing: modeling ridesharing and designing dynamic prices. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence
Chen R, Cassandras CG (2019) Optimization of ride sharing systems using eventdriven receding horizon control. arXiv:190101919
Chrobak M, Larmore LL (1991a) An optimal online algorithm for k servers on trees. SIAM Journal on Computing
Chrobak M, Larmore LL (1991) The server problem and online games. Online Algorithm 7:1
Chrobak M, Karloff H, Payne T, Vishwanathan S (1990) New results on server problems. SIAM Journal on Discrete Mathematics pp 291–300
Coester C, Koutsoupias E (2019) The online \(k\)taxi problem. In: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, ACM
Cordeau JF, Laporte G (2007) The dialaride problem: models and algorithms. Ann Oper Res 153(1):29–46
Danassis P (2022) Scalable multiagent coordination and resource sharing. Ph.D thesis, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne
Danassis P, Faltings B (2020) Efficient allocations in constant time: Towards scalable solutions in the era of large scale intelligent systems. In: Giacomo GD, Catalá A, Dilkina B, Milano M, Barro S, Bugarín A, Lang J (eds) ECAI 2020  24th European Conference on Artificial Intelligence, 29 August8 September 2020, Santiago de Compostela, Spain, August 29  September 8, 2020  Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), IOS Press, Frontiers in Artificial Intelligence and Applications, vol 325, pp 2895–2896, https://doi.org/10.3233/FAIA200441
Danassis P, FilosRatsikas A, Faltings B (2019) Anytime heuristic for weighted matching through altruisminspired behavior. In: Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence, IJCAI19, International Joint Conferences on Artificial Intelligence Organization, pp 215–222, https://doi.org/10.24963/ijcai.2019/31
Danassis P, Wiedemair F, Faltings B (2021) Improving multiagent coordination by learning to estimate contention. In: Zhou ZH (ed) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI21, International Joint Conferences on Artificial Intelligence Organization, pp 125–131, https://doi.org/10.24963/ijcai.2021/18, main Track
Danassis P, Triastcyn A, Faltings B (2022) A distributed differentially private algorithm for resource allocation in unboundedly large settings. In: Proceedings of the 21th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS22, International Foundation for Autonomous Agents and Multiagent Systems
Dehghani S, Ehsani S, Hajiaghayi M, Liaghat V, Seddighin S (2017) Stochastic kserver: How should uber work? In: 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017
Dickerson JP, Sankararaman KA, Srinivasan A, Xu P (2018) Allocation problems in ridesharing platforms: Online matching with offline reusable resources. In: ThirtySecond AAAI Conference on Artificial Intelligence
Dutta C, Sholley C (2018) Online matching in a ridesharing platform. arXiv preprint arXiv:1806.10327
Edmonds J (1965) Maximum matching and a polyhedron with 0 1vertices. J Res Natl Bureau Stand B 69:55–56
Fagnant DJ, Kockelman KM (2018) Dynamic ridesharing and fleet sizing for a system of shared autonomous vehicles in austin, texas. Transportation 45(1):143–158
Fakcharoenphol J, Rao S, Rao S, Talwar K (2003) A tight bound on approximating arbitrary metrics by tree metrics. In: Proceedings of the Thirtyfifth Annual ACM Symposium on Theory of Computing, STOC ’03
Fakcharoenphol J, Rao S, Talwar K (2004) A tight bound on approximating arbitrary metrics by tree metrics. J Comput Syst Sci 69(3):485–497
Feder T, Greene D (1988) Optimal algorithms for approximate clustering. In: Proceedings of the twentieth annual ACM symposium on Theory of computing, ACM, pp 434–444
Fiat A, Rabani Y, Ravid Y (1994) Competitive kserver algorithms. J Comput Syst Sci 48(3):410–428
Fielbaum A, AlonsoMora J (2020) Unreliability in ridesharing systems: Measuring changes in users’ times due to new requests. Transportation Research Part C: Emerging Technologies 121:102831. https://doi.org/10.1016/j.trc.2020.102831, https://www.sciencedirect.com/science/article/pii/S0968090X2030735X
Fielbaum A, Bai X, AlonsoMora J (2021) Ondemand ridesharing with optimized pickup and dropoff walking locations. Transportation Research Part C: Emerging Technologies 126:103061. https://doi.org/10.1016/j.trc.2021.103061, https://www.sciencedirect.com/science/article/pii/S0968090X21000887
Fielbaum A, Kronmueller M, AlonsoMora J (2021b) Anticipatory routing methods for an ondemand ridepooling mobility system. Transportation pp 1–42
Furuhata M, Dessouky M, Ordóñez F, Brunet ME, Wang X, Koenig S (2013) Ridesharing: The stateoftheart and future directions. Transp Res Part B: Methodol 57:28–46
Gao J, Wang Y, Tang H, Yin Z, Ni L, Shen Y (2017) An efficient dynamic ridesharing algorithm. In: 2017 IEEE International Conference on Computer and Information Technology (CIT), pp 320–325, https://doi.org/10.1109/CIT.2017.33
Garg N, Nazerzadeh H (2020) Driver surge pricing. In: Proceedings of the 21st ACM Conference on Economics and Computation, ACM, EC ’20
Ghili S, Kumar V (2020) Spatial distribution of supply and the role of market thickness: Theory and evidence from ridesharing. In: Proceedings of the 21st ACM Conference on Economics and Computation, ACM, EC ’20
Giordani S, Lujak M, Martinelli F (2010) A distributed algorithm for the multirobot task allocation problem. In: Int. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems
Goemans MX, Williamson DP (1997) The primaldual method for approximation algorithms and its application to network design problems. Approximation algorithms for NPhard problems pp 144–191
Guériau M, Dusparic I (2018) Samod: Shared autonomous mobilityondemand using decentralized reinforcement learning. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE
Guha S, Khuller S (1999) Greedy strikes back improved facility location algorithms. J Algorithms 31(1):228–248
He S, Shin KG (2019) Spatiotemporal capsulebased reinforcement learning for mobilityondemand network coordination. In: The World Wide Web Conference, WWW 2019, ACM, pp 2806–2813
Ho SC, Szeto W, Kuo YH, Leung JM, Petering M, Tou TW (2018) A survey of dialaride problems: Literature review and recent developments. Transp Res Part B: Methodol 111:395–421
Hsu WL, Nemhauser GL (1979) Easy and hard bottleneck location problems. Discret Appl Math 1(3):209–215
Huang T, Fang B, Bei X, Fang F (2019) Dynamic tripvehicle dispatch with scheduled and ondemand requests. In: The Conference on Uncertainty in Artificial Intelligence (UAI)
Ismail S, Sun L (2017) Decentralized hungarianbased approach for fast and scalable task allocation. In: 2017 Int. Conf. on Unmanned Aircraft Systems (ICUAS)
Jiang S, Chen L, Mislove A, Wilson C (2018) On ridesharing competition and accessibility: Evidence from uber, lyft, and taxi. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, ACM
Kooti F, Grbovic M, Aiello LM, Djuric N, Radosavljevic V, Lerman K (2017) Analyzing uber’s ridesharing economy. In: Proceedings of the 26th International Conference on World Wide Web Companion, 2017, ACM
Kosoresow AP (1996) Design and analysis of online algorithms for mobile server applications. PhD thesis, Stanford University, Stanford, CA, USA, aAI9702926
Koutsoupias E (2009) The kserver problem. Comput Sci Rev 3(2):105–118
Koutsoupias E, Papadimitriou CH (1995) On the kserver conjecture. J ACM (JACM) 42(5):971–983
Lee JR (2018) Fusible hsts and the randomized kserver conjecture. In: 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), IEEE
Lesmana N, Zhang X, Bei X (2019) Balancing efficiency and fairness in ondemand ridesourcing. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NEURIPS)
Li M, Qin Z, Jiao Y, Yang Y, Wang J, Wang C, Wu G, Ye J (2019) Efficient ridesharing order dispatching with mean field multiagent reinforcement learning. In: The World Wide Web Conference, WWW 2019, ACM
Lioris J, Cohen G, Seidowsky R, Salem HH (2016) Dynamic evolution and optimisation of an urban collective taxis systems by discreteevent simulation. In: ITS World Congress 2016, p 10
Lokhandwala M, Cai H (2018) Dynamic ride sharing using traditional taxis and shared autonomous taxis: A case study of nyc. Transportation Research Part C: Emerging Technologies 97:45–60. https://doi.org/10.1016/j.trc.2018.10.007, https://www.sciencedirect.com/science/article/pii/S0968090X18307551
Lowalekar M, Varakantham P, Jaillet P (2019) Zac: A zone path construction approach for effective realtime ridesharing. In: ICAPS
Ma H, Fang F, Parkes DC (2019) Spatiotemporal pricing for ridesharing platforms. In: Proceedings of the 2019 ACM Conference on Economics and Computation, ACM, pp 583–583
Manasse M, McGeoch L, Sleator D (1988) Competitive algorithms for online problems. In: Proceedings of the twentieth annual ACM symposium on Theory of computing, ACM, pp 322–333
Manasse MS, McGeoch LA, Sleator DD (1990) Competitive algorithms for server problems. J Algorithms 11(2):208–230
Martínez LM, Correia GHA, Moura F, Mendes Lopes M (2017) Insights into carsharing demand dynamics: Outputs of an agentbased model application to lisbon, portugal. Int J Sustain Transp 11(2):148–159
Mourad A, Puchinger J, Chu C (2019) A survey of models and algorithms for optimizing shared mobility. Transp Res Part B: Methodol 123:323–34
Nanda V, Xu P, Sankararaman KA, Dickerson JP, Srinivasan A (2020) Balancing the tradeoff between profit and fairness in rideshare platforms during highdemand hours. In: The ThirtyFourth AAAI Conference on Artificial Intelligence, AAAI 2020, AAAI Press
Pelzer D, Xiao J, Zehe D, Lees MH, Knoll AC, Aydt H (2015) A partitionbased match making algorithm for dynamic ridesharing. IEEE Trans Intell Transp Syst 16(5):2587–2598. https://doi.org/10.1109/TITS.2015.2413453
Qian X, Zhang W, Ukkusuri SV, Yang C (2017) Optimal assignment and incentive design in the taxi group ride problem. Transp Res Part B: Methodol 103:208–226
Raghavan P, Snir M (1989) Memory versus randomization in online algorithms. In: International Colloquium on Automata, Languages, and Programming, Springer, pp 687–703
Riley C, van Hentenryck P, Yuan E (2020) Realtime dispatching of largescale ridesharing systems: Integrating optimization, machine learning, and model predictive control. In: Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence, IJCAI20
Ruch C, Hörl S, Frazzoli E (2018) Amodeus, a simulationbased testbed for autonomous mobilityondemand systems. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), IEEE
Rudec T, Baumgartner A, Manger R (2013) A fast work function algorithm for solving the kserver problem. CEJOR 21(1):187–205
Santi P, Resta G, Szell M, Sobolevsky S, Strogatz SH, Ratti C (2014) Quantifying the benefits of vehicle pooling with shareability networks. Proceedings of the National Academy of Sciences
Santos DO, Xavier EC (2013) Dynamic taxi and ridesharing: A framework and heuristics for the optimization problem. In: TwentyThird International Joint Conference on Artificial Intelligence
Santos DO, Xavier EC (2015) Taxi and ride sharing: A dynamic dialaride problem with money as an incentive. Expert Syst Appl 42(19):6728–6737
Shah S, Lowalekar M, Varakantham P (2020) Neural approximate dynamic programming for ondemand ridepooling. Proceedings of the AAAI Conference on Artificial Intelligence 34(01):507–515. https://doi.org/10.1609/aaai.v34i01.5388, https://ojs.aaai.org/index.php/AAAI/article/view/5388
Shaheen S, Cohen A (2019) Shared ride services in north america: definitions, impacts, and the future of pooling. Transp Rev 39(4):427–442. https://doi.org/10.1080/01441647.2018.1497728
Silwal S, Gani MO, Raychoudhury V (2019) A survey of taxi ride sharing system architectures. In: 2019 IEEE International Conference on Smart Computing (SMARTCOMP), IEEE, pp 144–149
Simonetto A, Monteil J, Gambella C (2019) Realtime cityscale ridesharing via linear assignment problems. Transp Res Part C: Emerg Technol 101:208–232
Spieser K, Treleaven K, Zhang R, Frazzoli E, Morton D, Pavone M (2014) Toward a Systematic Approach to the Design and Evaluation of Automated MobilityonDemand Systems: A Case Study in Singapore, Springer International Publishing, Cham, pp 229–245. https://doi.org/10.1007/9783319059907_20
Spieser K, Samaranayake S, Gruel W, Frazzoli E (2016) Sharedvehicle mobilityondemand systems: a fleet operator’s guide to rebalancing empty vehicles. In: Transp. Research Board 95th Annual Meeting
Stiglic M, Agatz N, Savelsbergh M, Gradisar M (2015) The benefits of meeting points in ridesharing systems. Transportation Research Part B: Methodological 82:36–53. https://doi.org/10.1016/j.trb.2015.07.025, https://www.sciencedirect.com/science/article/pii/S0191261515002088
Sühr T, Biega AJ, Zehlike M, Gummadi KP, Chakraborty A (2019) Twosided fairness for repeated matchings in twosided markets: A case study of a ridehailing platform. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM
Tang M, Ow S, Chen W, Cao Y, Lye K, Pan Y (2017) The data and science behind grabshare carpooling. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
Tsao M, Milojevic D, Ruch C, Salazar M, Frazzoli E, Pavone M (2019) Model predictive control of ridesharing autonomous mobilityondemand systems. In: 2019 International Conference on Robotics and Automation (ICRA), pp 6665–6671, https://doi.org/10.1109/ICRA.2019.8794194
van Engelen M, Cats O, Post H, Aardal K (2018) Enhancing flexible transport services with demandanticipatory insertion heuristics. Transportation Research Part E: Logistics and Transportation Review 110:110–121. https://doi.org/10.1016/j.tre.2017.12.015, https://www.sciencedirect.com/science/article/pii/S1366554517307810
Vosooghi R, Puchinger J, Jankovic M, Vouillon A (2019) Shared autonomous vehicle simulation and service design. Transp Res Part C: Emerg Technol 107:15–33
Wallar A, Van Der Zee M, AlonsoMora J, Rus D (2018) Vehicle rebalancing for mobilityondemand systems with ridesharing. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 4539–4546, https://doi.org/10.1109/IROS.2018.8593743
Wen J, Zhao J, Jaillet P (2017) Rebalancing shared mobilityondemand systems: A reinforcement learning approach. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp 220–225, https://doi.org/10.1109/ITSC.2017.8317908
Widdows D, Lucas J, Tang M, Wu W (2017) Grabshare: The construction of a realtime ridesharing service. In: 2017 2nd IEEE International Conference on Intelligent Transportation Engineering (ICITE)
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Annals of Data Science
Xu Y, Xu P (2020) Trade the system efficiency for the income equality of drivers in rideshare. In: Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence, IJCAI 2020
Xue R, Sun DJ, Chen S (2015) Shortterm bus passenger demand prediction based on time series model and interactive multiple model approach. Discrete Dynamics in Nature and Society 2015. https://doi.org/10.1155/2015/682390
Yuen CF, Singh AP, Goyal S, Ranu S, Bagchi A (2019) Beyond shortest paths: Route recommendations for ridesharing. In: The World Wide Web Conference, ACM, pp 2258–2269
Zavlanos MM, Spesivtsev L, Pappas GJ (2008) A distributed auction algorithm for the assignment problem. In: Decision and Control, 2008., IEEE
Zhao B, Xu P, Shi Y, Tong Y, Zhou Z, Zeng Y (2019) Preferenceaware task assignment in ondemand taxi dispatching: An online stable matching approach. Proceed AAAI Conf Artif Intell 33:2245–2252
Zhou C, Dai P, Li R (2013) The passenger demand prediction model on bus networks. In: 2013 IEEE 13th International Conference on Data Mining Workshops, pp 1069–1076, https://doi.org/10.1109/ICDMW.2013.20
Funding
Open access funding provided by EPFL Lausanne.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Simulation results in detail
We present in detail the results of Sect. 7.2 including, but not limited to, larger testcases (broader NYC area), and the omitted algorithms, graphs, and tables. For every metric we report the average value out of 8 runs.
Section A.108:00–09:00 – Manhattan: We begin with our small testcase: one hour (08:00  09:00), base number of taxis (i.e., 4276, see Sect. 3.2.2), limited to Manhattan. Figure 9, and Table 5 depict all the evaluated metrics, while the latter also includes the standard deviation of each value. Finally, Table 6 presents the relative difference (percentage of gain or loss) compared to MWM (first line of the table). In what follows, we will adhere to the same pattern, i.e., presenting two tables for the same evaluation, one containing the absolute values, and one presenting the relative difference compared to the algorithm in the first line of the table. We were able to run most of the algorithms in this testcase, except for WFA which we run only for \(\{\times 0.5, \times 0.75\}\) the base number of taxis, and HC which is so computationally heavy, that we had to run a separate testcase of only 10 minutes (see Sect. A.5).
Offline algorithms (e.g., MWM, ALMA, Greedy) can be run either in a justintime (JiT) manner – i.e., when a request becomes critical – or in batches. The following two tables (Tables 7, and 8) evaluate the performance of each algorithm for each option. Given that our dataset has granularity of one minute, we run in batches of one, and two minutes. Moreover, due to the large number of requests, at least one request turns critical in every timestep. Thus, JiT and in batches of one minute produced the exact same results. To allow for the evaluation of every algorithm (except HC), we run the evaluation in a smaller scale, i.e., 2138 taxis (\(\{\times 0.5\}\) the base number of taxis). These tables also include the results for the WFA algorithm. Every other result presented in this paper assumes the best performing option for each of the algorithms (usually batch size of two minutes).
Figure 10 shows the sequence of percentiles for the various delays introduced in Sect. 3.1.2, while Table 9 presents the complete results.
Finally, Fig. 11 shows that our results are robust to a varying number of vehicles (2138  12828).
Section A.200:00–23:59 (full day)–Manhattan: We continue to show that the results are robust to a larger timescale. As before, Fig. 12, and Tables 10, and 11 depict all the evaluated metrics.
Sections A.308:00–09:00, and A.400:00–23:59 (full day)–Broader NYC Area: In the following two sections, we show that our results are robust to larger geographic areas, specifically in the broader NYC Area, including Manhattan, Bronx, Staten Island, Brooklyn, and Queens. Figure 13, and Tables 12, and 13, and Figure 14, and Tables 14, and 15 depict all the evaluated metrics, for one hour, and one day respectively.
Section A.508:00–08:10–Manhattan: This is a limited testcase aimed to evaluate the HC algorithm, due to its high computational complexity. Figure 15, and Tables 16, and 17 depict all the evaluated metrics.
Section A.6Dynamic Vehicle Relocation – 00:00–23:59 (full day) – Manhattan: In this section, we present results on the step (c) of the Ridesharing problem: dynamic relocation. We fix an algorithm for steps (a), and (b) – specifically MWM – to allow for a common ground and a fair comparison, focused only on the relocation part. Fig. 16, and Tables 18, and 19 depict all the evaluated metrics.
Section A.7EndToEnd Solution – 00:00–23:59 (full day) – Manhattan: As a final step, we evaluate endtoend solutions, using MWM, ALMA, and Greedy to solve all three of the steps of the Ridesharing problem. Figure 17, and Tables 20, and 21 present all the evaluated metrics.
2.1 08:00–09:00 – Manhattan
Figures 9, 10, and 11. Tables 5, 6, 7, 8, and 9.
2.2 00:00–23:59 (full day)–Manhattan
2.3 08:00–09:00–Broader NYC Area (Manhattan, Bronx, Staten Island, Brooklyn, Queens)
2.4 00:00–23:59 (full day) – Broader NYC Area (Manhattan, Bronx, Staten Island, Brooklyn, Queens)
2.5 08:00–08:10 – Manhattan
2.6 Dynamic vehicle relocation – 00:00–23:59 (full day) – Manhattan
2.7 EndToEnd Solution – 00:00–23:59 (full day) – Manhattan
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Danassis, P., Sakota, M., FilosRatsikas, A. et al. Putting ridesharing to the test: efficient and scalable solutions and the power of dynamic vehicle relocation. Artif Intell Rev 55, 5781–5844 (2022). https://doi.org/10.1007/s10462022101450
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462022101450