Skip to main content
Log in

Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The popularity of mobility-on-demand (MoD) systems boosts online collective multiagent planning (Online_CMP), where spatially distributed servicing agents are planned to meet dynamically arriving demands. For city-scale MoDs with a fleet of agents, Online_CMP methods must make a tradeoff between computation time (i.e., real-time) and solution quality (i.e., the number of demands served). Directly using an offline policy can guarantee real-time, but cannot be dynamically adjusted to real agent and demand distributions. Search-based online planning methods are adaptive, but are computationally expensive and cannot scale up. In this paper, we propose a principled Online_CMP method, which reuses and improves the offline policy in an anytime manner. We first model MoDs as a collective Markov Decision Process (\({\mathbb {C}}\)-MDP) where the collective behavior of agents affects the joint reward. Given the \({\mathbb {C}}\)-MDP model, we propose a novel state value function to evaluate the policy, and a gradient ascent (GA) technique to improve the policy. We further show that offline GA-based policy iteration (GA-PI) can converge to global optima of \({\mathbb {C}}\)-MDP under certain conditions. Finally, with real-time information, the offline policy is used as the default plan, GA-PI is used to improve it and generate an online plan. Experimental results show that our offline policy reuse-guided Online_CMP method significantly outperforms standard online multiagent planning methods on MoD systems like ride-sharing and security traffic patrolling in terms of computation time and solution quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Availability of data and materials

The dataset used will be open-sourced once this paper is accepted.

Notes

  1. Our method can be easily extended to general settings where agents have a randomized distribution on states.

  2. From here on in our discussion we will assume no discounting, although for completeness we do include the possibility of discounting in the algorithm.

  3. https://github.com/TrafficRun.

References

  1. (2016). Taxi and limousine commission (tlc) trip record data. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

  2. Alonso-Mora, J., Samaranayake, S., Wallar, A., Frazzoli, E., & Rus, D. (2017). On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment. Proceedings of the National Academy of Sciences, 114(3), 462–467.

    Article  Google Scholar 

  3. Bellman, R. E. (1957). Dynamic programming, adaptive computation and machine learning. Princeton University Press.

  4. Bertsekas, D. P. (2005). Dynamic programming and optimal control (3rd ed.). Athena Scientific.

  5. Bistaffa, F., Farinelli, A. & Ramchurn, S. D. (2015), Sharing rides with friends: A coalition formation algorithm for ridesharing. In Proceedings of the 29th AAAI conference on artificial intelligence, January 25–30, 2015, Austin, TX, USA (pp. 608–614).

  6. Boyd, S., & Vandenberghe, L. (2004). Convex optimization (1st Ed.). Cambridge University Press.

  7. Brown, M., Saisubramanian, S., Varakantham, P., & Tambe M. (2014). STREETS: Game-theoretic traffic patrolling with exploration and exploitation. In AAAI’14 (pp. 2966–2971).

  8. Chaudhari, H. A., Byers, J. W. & Terzi, E. (2020). Learn to earn: Enabling coordination within a ride-hailing fleet. In IEEE international conference on big data, big data 2020, Atlanta, GA, USA, December 10–13, 2020 (pp. 1127–1136).

  9. Claes, D., Oliehoek, FA., Baier, H., & Tuyls, K. (2017). Decentralised online planning for multi-robot warehouse commissioning. In Proceedings of the 16th conference on autonomous agents and multiAgent systems, AAMAS 2017, São Paulo, Brazil, May 8–12 (pp. 492–500).

  10. Dickerson, J. P., Sankararaman, K. A., Srinivasan, A., & Xu, P. (2018a). Allocation problems in ride-sharing platforms: Online matching with offline reusable resources. In Proceedings of the 32nd AAAI conference on artificial intelligence (AAAI’18), New Orleans, Louisiana, USA, February 2–7, 2018 (pp. 1007–1014).

  11. Dickerson, J. P., Sankararaman, K. A., Srinivasan, A., & Xu, P. (2018b). Assigning tasks to workers based on historical data: Online task assignment with two-sided arrivals. In Proceedings of the 17th international conference on autonomous agents and multiAgent systems, AAMAS 2018, Stockholm, Sweden, July 10–15, 2018 (pp. 318–326).

  12. Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l1-ball for learning in high dimensions. In ICML’08 (pp. 272–279).

  13. Flaxman, A., Kalai, A. T. & McMahan, H. B. (2005). Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA’05 (pp. 385–394).

  14. Gilpin, A., Hoda, S., Peña, J., & Sandholm, T. (2007). Gradient-based algorithms for finding Nash equilibria in extensive form games. In WINE’07 (pp. 57–69).

  15. He, S. & Shin, K. G. (2019). Spatio-temporal capsule-based reinforcement learning for mobility-on-demand network coordination. In The World Wide Web conference, WWW 2019, San Francisco, CA, USA, May 13–17, 2019 (pp. 2806–2813).

  16. Jin, J., Zhou, M., Zhang, W., Li, M., Guo, Z., Qin, Z., Jiao, Y., Tang, X., Wang, C., Wang, J., & Wu, G. (2019). Coride: Joint order dispatching and fleet management for multi-scale ride-hailing platforms. In Proceedings of the 28th ACM international conference on information and knowledge management, CIKM 2019, Beijing, China, November 3–7, 2019 (pp. 1983–1992).

  17. Kumar, R. R. & Varakantham, P. (2017). Exploiting anonymity and homogeneity in factored dec-mdps through precomputed binomial distributions. In Proceedings of the 16th conference on autonomous agents and multiAgent systems, AAMAS’17 (pp. 732–740).

  18. Li, M., Qin, Z., Jiao, Y., Yang, Y., Wang, J., Wang, C., Wu, G., & Ye, J. (2019). Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In The World Wide Web conference, WWW 2019, San Francisco, CA, USA, May 13–17, 2019 (pp. 983–994).

  19. Lin, K., Zhao, R., Xu, Z., & Zhou, J. (2018). Efficient large-scale fleet management via multi-agent deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, KDD 2018, London, UK, August 19–23, 2018 (pp. 1774–1783).

  20. Lowalekar, M., Varakantham, P. & Jaillet, P. (2020). Competitive ratios for online multi-capacity ridesharing. In Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS ’20, Auckland, New Zealand, May 9–13, 2020 (pp. 771–779).

  21. Lowalekar, M., Varakantham, P., Ghosh, S., & Jena, S., & Jaillet, P. (2017). Online repositioning in bike sharing systems. In Proceedings of the 27th international conference on automated planning and scheduling (ICAPS’17), Pittsburgh, Pennsylvania, USA, June 18–23, 2017 (pp. 200–208).

  22. Lowalekar, M., Varakantham, P., & Jaillet, P. (2018). Online spatio-temporal matching in stochastic and dynamic domains. Artificial Intelligence, 261, 71–112.

    Article  MathSciNet  Google Scholar 

  23. Ma, S., Zheng, Y., & Wolfson, O. (2015). Real-time city-scale taxi ridesharing. IEEE Transactions on Knowledge and Data Engineering, 27(7), 1782–1795.

    Article  Google Scholar 

  24. Mukhopadhyay, A., Vorobeychik, Y., & Dubey, A. (2017). Prioritized allocation of emergency responders based on a continuous-time incident prediction model. In Proceedings of the 16th conference on autonomous agents and multiAgent systems (AAMAS’17), São Paulo, Brazil, May 8–12, 2017 (pp. 168–177).

  25. Nguyen, D. T., Kumar, A. & Lau, H. C. (2017a). Collective multiagent sequential decision making under uncertainty. In Proceedings of the 31st AAAI conference on artificial intelligence, February 4–9, 2017, San Francisco, California, USA (pp. 3036–3043).

  26. Nguyen, D. T., Kumar, A. & Lau, H. C. (2017b). Policy gradient with value function approximation for collective multiagent planning. In Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA (pp. 4319–4329).

  27. Nguyen, D. T., Kumar, A., & Lau, H. C. (2018). Credit assignment for collective multiagent rl with global rewards. In NeuIPS’18 (pp. 8102–8113).

  28. Qin, Z. T., Tang, X., Jiao, Y., Zhang, F., Xu, Z., Zhu, H., & Ye, J. (2020). Ride-hailing order dispatching at didi via reinforcement learning. Interfaces, 50(5), 272–286.

    Google Scholar 

  29. Rosenfeld, A. & Kraus, S. (2017). When security games hit traffic: Optimal traffic enforcement under one sided uncertainty. In Sierra C (Ed.), IJCAI’17 (pp. 3814–3822).

  30. Rosenfeld, A., Maksimov, O., & Kraus, S. (2020). When security games hit traffic: A deployed optimal traffic enforcement system. Artificial Intelligence, 289(103), 381.

    Google Scholar 

  31. Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems 12, [NIPS conference, Denver, Colorado, USA, November 29–December 4, 1999] (pp. 1057–1063).

  32. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Adaptive computation and machine learning. MIT Press.

  33. Tang, X., Zhang, F., Qin, Z., Wang, Y., Shi, D., Song, B., Tong, Y., Zhu, H., & Ye, J. (2021). Value function is all you need: A unified learning framework for ride hailing platforms. In KDD’21.

  34. Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10, 1633–1685.

    MathSciNet  Google Scholar 

  35. Tong, Y., Zeng, Y., Ding, B., Wang, L., & Chen, L. (2021). Two-sided online micro-task assignment in spatial crowdsourcing. IIEEE Transactions on Knowledge and Data Engineering, 33(5), 2295–2309.

    Google Scholar 

  36. Varakantham, P., Adulyasak, Y. & Jaillet, P. (2014). Decentralized stochastic planning with anonymity in interactions. In AAAI’14 (pp. 2505–2512).

  37. Wang, W., Dong, Z., An, B., & Jiang, Y. (2021). Toward efficient city-scale patrol planning using decomposition and grafting. IEEE Transactions on Intelligent Transportation Systems, 22(2), 747–757.

    Article  Google Scholar 

  38. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Machine learning (pp. 229–256).

  39. Xu, Z., Li, Z., Guan, Q., Zhang, D., Li, Q., Nan, J., Liu, C., Bian, W., & Ye, J. (2018). Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. In KDD’18 (pp. 905–913).

  40. Xu, Y., Wang, W., Xiong, G., Liu, X., Wu, W., & Liu, K. (2021). Network-flow-based efficient vehicle dispatch for city-scale ride-hailing systems. Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2021.3054893

    Article  Google Scholar 

  41. Yue, Y., Marla, L. & Krishnan, R. (2012). An efficient simulation-based approach to ambulance fleet allocation and dynamic redeployment. In Proceedings of the 26th AAAI conference on artificial intelligence, July 22–26, 2012, Toronto, ON, Canada.

  42. Zhou, M., Jin, J., Zhang, W., Qin, Z., Jiao, Y., Wang, C., Wu, G., Yu, Y., & Ye, J. (2019). Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In Proceedings of the 28th ACM international conference on information and knowledge management, CIKM 2019, Beijing, China, November 3–7, 2019 (pp. 2645–2653).

Download references

Acknowledgements

This research was supported in part by the Key Research and Development Projects in Jiangsu Province under Grant BE2021001-2, and in part by the National Natural Science Foundation of China (62076060, 62072099, 61932007, 61806053).

Funding

This research was supported in part by the Key Research and Development Projects in Jiangsu Province under Grant BE2021001-2, and in part by the National Natural Science Foundation of China (62076060, 62072099, 61932007, 61806053).

Author information

Authors and Affiliations

Authors

Contributions

Wanyuan Wang and Qian Che designed the algorithm and implemented the experiments, Yifeng Zhou, Bo An, and Yichuan Jiang wrote the main manuscript text, and Weiwei Wu analyzed the algorithm.

Corresponding author

Correspondence to Qian Che.

Ethics declarations

Conflict of interest

The authors declared that they have no Conflict of interest to this work.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Che, Q., Zhou, Y. et al. Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems. Auton Agent Multi-Agent Syst 38, 19 (2024). https://doi.org/10.1007/s10458-024-09650-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-024-09650-z

Keywords

Navigation