Skip to main content
Log in

Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the emergence of online car-hailing platforms, more travel options and convenience have been provided to people. However, the ’tidal phenomenon’ of travel often leads to an imbalance between the supply and demand of vehicles, especially during peak hours. In this paper, we propose a reinforcement learning algorithm for fleet dispatch using effective Credit Assignment Deep Policy Gradient (CADPG). The CADPG model first learns an action for each agent (i.e., vehicle) with the local states of the vehicle through the policy network. Secondly, a set of parameters for credit assignment to compute the total Q value is learned by a hyper-network with the input of the global state. Finally, we feed the joint action vectors and the hyperparameters produced by the hyper-network into the critic network to obtain the total Q value of the joint actions. Experimental results conducted on real datasets show that our proposed method outperforms the compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. data sources: (https://outreach.didichuxing.com/research/opendata/en/)

References

  1. Al-Kanj L, Nascimento J, Powell WB (2020) Approximate dynamic programming for planning a ride-hailing system using autonomous fleets of electric vehicles. European Journal of Operational Research 284(3):1088–1106

    Article  MathSciNet  MATH  Google Scholar 

  2. Alkouz, B., Bouguettaya, A.: A reinforcement learning approach for re-allocating drone swarm services. In: Proceedings of the 19th International Conference on Service-Oriented Computing, pp. 643–651 (2021)

  3. Chen XM, Zheng H, Ke J, Yang H (2020) Dynamic optimization strategies for on-demand ride services platform: Surge pricing, commission rate, and incentives. Transportation Research Part B: Methodological 138:23–45

    Article  Google Scholar 

  4. Chen, Z., Liu, K., Feng, T.: Examine the prediction error of ride-hailing travel demands with various ignored sparse demand effects. Journal of Advanced Transportation pp. 1–11 (2022)

  5. Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on artificial intelligence, pp. 2974–2982 (2018)

  6. Guériau, M., Dusparic, I.: Samod: Shared autonomous mobility-on-demand using decentralized reinforcement learning. In: Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems, pp. 1558–1563 (2018)

  7. Guo G, Xu T (2020) Vehicle rebalancing with charging scheduling in one-way car-sharing systems. IEEE Transactions on Intelligent Transportation Systems 23(5):4342–4351

    Article  Google Scholar 

  8. Guo X, Caros NS, Zhao J (2021) Robust matching-integrated vehicle rebalancing in ride-hailing system with uncertain demand. Transportation Research Part B: Methodological 150:161–189

    Article  Google Scholar 

  9. Guo X, Wang Q, Zhao J (2022) Data-driven vehicle rebalancing with predictive prescriptions in the ride-hailing system. IEEE Open Journal of Intelligent Transportation Systems 3:251–266

    Article  Google Scholar 

  10. He, S., Pepin, L., Wang, G., Zhang, D., Miao, F.: Data-driven distributionally robust electric vehicle balancing for mobility-on-demand systems under demand and supply uncertainties. In: Proceedings of the International Conference on Intelligent Robots and Systems, pp. 2165–2172 (2020)

  11. Holler, J., Vuorio, R., Qin, Z., Tang, X., Jiao, Y., Jin, T., Singh, S., Wang, C., Ye, J.: Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem. In: Proceedings of the IEEE International Conference on Data Mining, pp. 1090–1095 (2019)

  12. Huang Z, Huang G, Chen Z, Wu C, Ma X, Wang H (2019) Multi-regional online car-hailing order quantity forecasting based on the convolutional neural network. Machine Learning on Scientific Data and Information 10(6):193–201

    Google Scholar 

  13. Iacobucci R, Bruno R, Boldrini C (2022) A multi-stage optimisation approach to design relocation strategies in one-way car-sharing systems with stackable cars. IEEE Transactions on Intelligent Transportation Systems 23(10):17048–17061

    Article  Google Scholar 

  14. Jin, J., Zhou, M., Zhang, W., Li, M., Guo, Z., Qin, Z., Jiao, Y., Tang, X., Wang, C., Wang, J., et al.: Coride: joint order dispatching and fleet management for multi-scale ride-hailing platforms. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1983–1992 (2019)

  15. Jintao K, Yang H, Ye J et al (2020) Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. IEEE Transactions on Knowledge and Data Engineering 34(5):2280–2292

    Google Scholar 

  16. Li J, Xin L, Cao Z, Lim A, Song W, Zhang J (2021) Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 23(3):2306–2315

    Article  Google Scholar 

  17. Li, M., Qin, Z., Jiao, Y., Yang, Y., Wang, J., Wang, C., Wu, G., Ye, J.: Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: Proceedings of the International Conference on World Wide Web, pp. 983–994 (2019)

  18. Li, Y., Zheng, Y., Yang, Q.: Dynamic bike reposition: A spatio-temporal reinforcement learning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1724–1733 (2018)

  19. Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems 33(8):3961–3973

    Article  MathSciNet  Google Scholar 

  20. Lian B, Lewis FL, Hewer GA, Estabridis K, Chai T (2022) Online learning of minmax solutions for distributed estimation and tracking control of sensor networks in graphical games. IEEE Transactions on Control of Network Systems 9(4):1923–1936

    Article  MathSciNet  Google Scholar 

  21. Liang X, Du X, Wang G, Han Z (2019) A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology 68(2):1243–1253

    Article  Google Scholar 

  22. Lin, K., Zhao, R., Xu, Z., Zhou, J.: Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1774–1783 (2018)

  23. Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia 24:2449–2460

    Article  Google Scholar 

  24. Liu, H., Liu, T., Chen, Y., Zhang, Z., Li, Y.F.: Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Transactions on Multimedia pp. 1–12 (2022)

  25. Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Transactions on Industrial Informatics 18(10):7107–7117

    Article  Google Scholar 

  26. Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322

    Article  Google Scholar 

  27. Liu H, Zheng C, Li D, Shen X, Lin K, Wang J, Zhang Z, Zhang Z, Xiong NN (2021) Edmf: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics 18(7):4361–4371

    Article  Google Scholar 

  28. Liu M, Wan Y, Lewis FL, Nageshrao S, Filev D (2022) A three-level game-theoretic decision-making framework for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 23(11):20298–20308

    Article  Google Scholar 

  29. Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220

    Article  Google Scholar 

  30. Liu Z, Li J, Wu K (2020) Context-aware taxi dispatching at city-scale using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems 23(3):1996–2009

    Article  Google Scholar 

  31. Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30:6379–6390

    Google Scholar 

  32. Ma Y, Li J, Cao Z, Song W, Zhang L, Chen Z, Tang J (2021) Learning to iteratively solve routing problems with dual-aspect collaborative transformer. Advances in Neural Information Processing Systems 34:11096–11107

    Google Scholar 

  33. Madhurya, T., Karthik, V.: A survey on the implementation of reinforcement learning on shared taxi system. pp. 241–252 (2018)

  34. Qin Z, Tang X, Jiao Y, Zhang F, Xu Z, Zhu H, Ye J (2020) Ride-hailing order dispatching at didi via reinforcement learning. INFORMS Journal on Applied Analytics 50(5):272–286

    Article  Google Scholar 

  35. Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 5887–5896 (2019)

  36. Sun, Y., Ding, Z., Hu, Z., Lee, W.J.: Risk-aware operation modeling for ride-hailing fleet in order grabbing mode: A distributional reinforcement learning approach. IEEE Transactions on Smart Grid pp. 1–11 (2023)

  37. Tang, X., Qin, Z., Zhang, F., Wang, Z., Xu, Z., Ma, Y., Zhu, H., Ye, J.: A deep value-network based approach for multi-driver order dispatching. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1780–1790 (2019)

  38. Tang, X., Zhang, F., Qin, Z., Wang, Y., Shi, D., Song, B., Tong, Y., Zhu, H., Ye, J.: Value function is all you need: A unified learning framework for ride hailing platforms. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3605–3615 (2021)

  39. Tong, Y., Shi, D., Xu, Y., Lv, W., Qin, Z., Tang, X.: Combinatorial optimization meets reinforcement learning: Effective taxi order dispatching at large-scale. IEEE Transactions on Knowledge and Data Engineering pp. 1–12 (2021)

  40. Wang Y, Sun J, He H, Sun C (2019) Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Transactions on Systems 50(10):3713–3725

    Google Scholar 

  41. Wang, Y., Tong, Y., Long, C., Xu, P., Xu, K., Lv, W.: Adaptive dynamic bipartite graph matching: A reinforcement learning approach. In: Proceedings of the IEEE 35th International Conference on Data Engineering, pp. 1478–1489 (2019)

  42. Wang, Z., Qin, Z., Tang, X., Ye, J., Zhu, H.: Deep reinforcement learning with knowledge transfer for online rides order dispatching. In: Proceedings of the IEEE International Conference on Data Mining, pp. 617–626 (2018)

  43. Weber, T., Heess, N., Buesing, L., Silver, D.: Credit assignment techniques in stochastic computation graphs. In: Proceedings of the The 22nd International Conference on Artificial Intelligence and Statistics, vol. 89, pp. 2650–2660 (2019)

  44. Wu Y, Song W, Cao Z, Zhang J, Lim A (2021) Learning improvement heuristics for solving routing problems. IEEE transactions on neural networks and learning systems 33(9):5057–5069

  45. Xu, Z., Li, Z., Guan, Q., Zhang, D., Li, Q., Nan, J., Liu, C., Bian, W., Ye, J.: Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 905–913 (2018)

  46. Zhang J, Qian F, Yang J (2022) Online routing and spectrum allocation in elastic optical networks based on dueling deep q-network. Computers & Industrial Engineering 173:1–10

    Article  Google Scholar 

  47. Zhang, L., Hu, T., Min, Y., Wu, G., Zhang, J., Feng, P., Gong, P., Ye, J.: A taxi order dispatch model based on combinatorial optimization. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 2151–2159 (2017)

  48. Zhang, W., Wang, Q., Li, J., Shi, D.: Dynamic vehicle dispatching based on minimum fleet a deep reinforcement learning method. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 1–13 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., Zhang, X., Ling, J. et al. Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch. Appl Intell 53, 23457–23469 (2023). https://doi.org/10.1007/s10489-023-04689-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04689-z

Keywords

Navigation