Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch

Huang, Xiaohui; Zhang, Xiong; Ling, Jiahao; Cheng, Xuebo

doi:10.1007/s10489-023-04689-z

Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch

Published: 11 July 2023

Volume 53, pages 23457–23469, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xiaohui Huang¹,
Xiong Zhang ORCID: orcid.org/0000-0003-1526-2276¹,
Jiahao Ling¹ &
…
Xuebo Cheng¹

210 Accesses
1 Citation
Explore all metrics

Abstract

With the emergence of online car-hailing platforms, more travel options and convenience have been provided to people. However, the ’tidal phenomenon’ of travel often leads to an imbalance between the supply and demand of vehicles, especially during peak hours. In this paper, we propose a reinforcement learning algorithm for fleet dispatch using effective Credit Assignment Deep Policy Gradient (CADPG). The CADPG model first learns an action for each agent (i.e., vehicle) with the local states of the vehicle through the policy network. Secondly, a set of parameters for credit assignment to compute the total Q value is learned by a hyper-network with the input of the global state. Finally, we feed the joint action vectors and the hyperparameters produced by the hyper-network into the critic network to obtain the total Q value of the joint actions. Experimental results conducted on real datasets show that our proposed method outperforms the compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Transportation Network Company Vehicle Dispatching via Deep Deterministic Policy Gradient

Deep reinforcement learning for urban multi-taxis cruising strategy

Article 27 April 2022

Cooperative Multi-agent Reinforcement Learning for Autonomous Cars Passing on Narrow Road

Notes

data sources: (https://outreach.didichuxing.com/research/opendata/en/)

References

Al-Kanj L, Nascimento J, Powell WB (2020) Approximate dynamic programming for planning a ride-hailing system using autonomous fleets of electric vehicles. European Journal of Operational Research 284(3):1088–1106
Article MathSciNet MATH Google Scholar
Alkouz, B., Bouguettaya, A.: A reinforcement learning approach for re-allocating drone swarm services. In: Proceedings of the 19th International Conference on Service-Oriented Computing, pp. 643–651 (2021)
Chen XM, Zheng H, Ke J, Yang H (2020) Dynamic optimization strategies for on-demand ride services platform: Surge pricing, commission rate, and incentives. Transportation Research Part B: Methodological 138:23–45
Article Google Scholar
Chen, Z., Liu, K., Feng, T.: Examine the prediction error of ride-hailing travel demands with various ignored sparse demand effects. Journal of Advanced Transportation pp. 1–11 (2022)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on artificial intelligence, pp. 2974–2982 (2018)
Guériau, M., Dusparic, I.: Samod: Shared autonomous mobility-on-demand using decentralized reinforcement learning. In: Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems, pp. 1558–1563 (2018)
Guo G, Xu T (2020) Vehicle rebalancing with charging scheduling in one-way car-sharing systems. IEEE Transactions on Intelligent Transportation Systems 23(5):4342–4351
Article Google Scholar
Guo X, Caros NS, Zhao J (2021) Robust matching-integrated vehicle rebalancing in ride-hailing system with uncertain demand. Transportation Research Part B: Methodological 150:161–189
Article Google Scholar
Guo X, Wang Q, Zhao J (2022) Data-driven vehicle rebalancing with predictive prescriptions in the ride-hailing system. IEEE Open Journal of Intelligent Transportation Systems 3:251–266
Article Google Scholar
He, S., Pepin, L., Wang, G., Zhang, D., Miao, F.: Data-driven distributionally robust electric vehicle balancing for mobility-on-demand systems under demand and supply uncertainties. In: Proceedings of the International Conference on Intelligent Robots and Systems, pp. 2165–2172 (2020)
Holler, J., Vuorio, R., Qin, Z., Tang, X., Jiao, Y., Jin, T., Singh, S., Wang, C., Ye, J.: Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem. In: Proceedings of the IEEE International Conference on Data Mining, pp. 1090–1095 (2019)
Huang Z, Huang G, Chen Z, Wu C, Ma X, Wang H (2019) Multi-regional online car-hailing order quantity forecasting based on the convolutional neural network. Machine Learning on Scientific Data and Information 10(6):193–201
Google Scholar
Iacobucci R, Bruno R, Boldrini C (2022) A multi-stage optimisation approach to design relocation strategies in one-way car-sharing systems with stackable cars. IEEE Transactions on Intelligent Transportation Systems 23(10):17048–17061
Article Google Scholar
Jin, J., Zhou, M., Zhang, W., Li, M., Guo, Z., Qin, Z., Jiao, Y., Tang, X., Wang, C., Wang, J., et al.: Coride: joint order dispatching and fleet management for multi-scale ride-hailing platforms. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1983–1992 (2019)
Jintao K, Yang H, Ye J et al (2020) Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. IEEE Transactions on Knowledge and Data Engineering 34(5):2280–2292
Google Scholar
Li J, Xin L, Cao Z, Lim A, Song W, Zhang J (2021) Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 23(3):2306–2315
Article Google Scholar
Li, M., Qin, Z., Jiao, Y., Yang, Y., Wang, J., Wang, C., Wu, G., Ye, J.: Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: Proceedings of the International Conference on World Wide Web, pp. 983–994 (2019)
Li, Y., Zheng, Y., Yang, Q.: Dynamic bike reposition: A spatio-temporal reinforcement learning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1724–1733 (2018)
Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems 33(8):3961–3973
Article MathSciNet Google Scholar
Lian B, Lewis FL, Hewer GA, Estabridis K, Chai T (2022) Online learning of minmax solutions for distributed estimation and tracking control of sensor networks in graphical games. IEEE Transactions on Control of Network Systems 9(4):1923–1936
Article MathSciNet Google Scholar
Liang X, Du X, Wang G, Han Z (2019) A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology 68(2):1243–1253
Article Google Scholar
Lin, K., Zhao, R., Xu, Z., Zhou, J.: Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1774–1783 (2018)
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia 24:2449–2460
Article Google Scholar
Liu, H., Liu, T., Chen, Y., Zhang, Z., Li, Y.F.: Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Transactions on Multimedia pp. 1–12 (2022)
Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Transactions on Industrial Informatics 18(10):7107–7117
Article Google Scholar
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Article Google Scholar
Liu H, Zheng C, Li D, Shen X, Lin K, Wang J, Zhang Z, Zhang Z, Xiong NN (2021) Edmf: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics 18(7):4361–4371
Article Google Scholar
Liu M, Wan Y, Lewis FL, Nageshrao S, Filev D (2022) A three-level game-theoretic decision-making framework for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 23(11):20298–20308
Article Google Scholar
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Article Google Scholar
Liu Z, Li J, Wu K (2020) Context-aware taxi dispatching at city-scale using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 23(3):1996–2009
Article Google Scholar
Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30:6379–6390
Google Scholar
Ma Y, Li J, Cao Z, Song W, Zhang L, Chen Z, Tang J (2021) Learning to iteratively solve routing problems with dual-aspect collaborative transformer. Advances in Neural Information Processing Systems 34:11096–11107
Google Scholar
Madhurya, T., Karthik, V.: A survey on the implementation of reinforcement learning on shared taxi system. pp. 241–252 (2018)
Qin Z, Tang X, Jiao Y, Zhang F, Xu Z, Zhu H, Ye J (2020) Ride-hailing order dispatching at didi via reinforcement learning. INFORMS Journal on Applied Analytics 50(5):272–286
Article Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 5887–5896 (2019)
Sun, Y., Ding, Z., Hu, Z., Lee, W.J.: Risk-aware operation modeling for ride-hailing fleet in order grabbing mode: A distributional reinforcement learning approach. IEEE Transactions on Smart Grid pp. 1–11 (2023)
Tang, X., Qin, Z., Zhang, F., Wang, Z., Xu, Z., Ma, Y., Zhu, H., Ye, J.: A deep value-network based approach for multi-driver order dispatching. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1780–1790 (2019)
Tang, X., Zhang, F., Qin, Z., Wang, Y., Shi, D., Song, B., Tong, Y., Zhu, H., Ye, J.: Value function is all you need: A unified learning framework for ride hailing platforms. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3605–3615 (2021)
Tong, Y., Shi, D., Xu, Y., Lv, W., Qin, Z., Tang, X.: Combinatorial optimization meets reinforcement learning: Effective taxi order dispatching at large-scale. IEEE Transactions on Knowledge and Data Engineering pp. 1–12 (2021)
Wang Y, Sun J, He H, Sun C (2019) Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Transactions on Systems 50(10):3713–3725
Google Scholar
Wang, Y., Tong, Y., Long, C., Xu, P., Xu, K., Lv, W.: Adaptive dynamic bipartite graph matching: A reinforcement learning approach. In: Proceedings of the IEEE 35th International Conference on Data Engineering, pp. 1478–1489 (2019)
Wang, Z., Qin, Z., Tang, X., Ye, J., Zhu, H.: Deep reinforcement learning with knowledge transfer for online rides order dispatching. In: Proceedings of the IEEE International Conference on Data Mining, pp. 617–626 (2018)
Weber, T., Heess, N., Buesing, L., Silver, D.: Credit assignment techniques in stochastic computation graphs. In: Proceedings of the The 22nd International Conference on Artificial Intelligence and Statistics, vol. 89, pp. 2650–2660 (2019)
Wu Y, Song W, Cao Z, Zhang J, Lim A (2021) Learning improvement heuristics for solving routing problems. IEEE transactions on neural networks and learning systems 33(9):5057–5069
Xu, Z., Li, Z., Guan, Q., Zhang, D., Li, Q., Nan, J., Liu, C., Bian, W., Ye, J.: Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 905–913 (2018)
Zhang J, Qian F, Yang J (2022) Online routing and spectrum allocation in elastic optical networks based on dueling deep q-network. Computers & Industrial Engineering 173:1–10
Article Google Scholar
Zhang, L., Hu, T., Min, Y., Wu, G., Zhang, J., Feng, P., Gong, P., Ye, J.: A taxi order dispatch model based on combinatorial optimization. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 2151–2159 (2017)
Zhang, W., Wang, Q., Li, J., Shi, D.: Dynamic vehicle dispatching based on minimum fleet a deep reinforcement learning method. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 1–13 (2019)

Download references

Author information

Authors and Affiliations

School of Information Engineering Department, East China Jiaotong University, 330013, Nanchang, China
Xiaohui Huang, Xiong Zhang, Jiahao Ling & Xuebo Cheng

Authors

Xiaohui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahao Ling
View author publications
You can also search for this author in PubMed Google Scholar
Xuebo Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, X., Zhang, X., Ling, J. et al. Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch. Appl Intell 53, 23457–23469 (2023). https://doi.org/10.1007/s10489-023-04689-z

Download citation

Accepted: 05 May 2023
Published: 11 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04689-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch

Abstract

Access this article

Similar content being viewed by others

Optimal Transportation Network Company Vehicle Dispatching via Deep Deterministic Policy Gradient

Deep reinforcement learning for urban multi-taxis cruising strategy

Cooperative Multi-agent Reinforcement Learning for Autonomous Cars Passing on Narrow Road

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch

Abstract

Access this article

Similar content being viewed by others

Optimal Transportation Network Company Vehicle Dispatching via Deep Deterministic Policy Gradient

Deep reinforcement learning for urban multi-taxis cruising strategy

Cooperative Multi-agent Reinforcement Learning for Autonomous Cars Passing on Narrow Road

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation