Skip to main content
Log in

Traffic signal control using a cooperative EWMA-based multi-agent reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In contemporary urban, traffic signal control is still enormously difficult. Multi-agent reinforcement learning (MARL) is a promising ways to solve this problem. However, most MARL algorithms can not effectively transfer learning strategies when the agents increase or decrease. This paper proposes a new MARL algorithm called cooperative dynamic delay updating twin delayed deep deterministic policy gradient based on the exponentially weighted moving average (CoTD3-EWMA) to solve the problem. By introducing mean-field theory, the algorithm implicitly models the interaction between agents and environment. It reduces the dimension of action space and improves the scalability of the algorithm. In addition, we propose a dynamic delay updating method based on the exponentially weighted moving average (EWMA), which improves the Q value overestimation problem of the traditional TD3 algorithm. Moreover, a joint reward allocation mechanism and state sharing mechanism are proposed to improve the global strategy learning ability and robustness of the agent. The simulation results show that the performance of the new algorithm is better than the current state-of-the-art algorithms, which effectively reduces the delay time of vehicles and improves the traffic efficiency of the traffic network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Guo Q, Li L, Ban X J (2019) Urban traffic signal control with connected and automated vehicles: A survey. Transp Res Part C: Emerging Technol 101:313–334

    Article  Google Scholar 

  2. Gao K, Zhang Y, Su R, Yang F, Suganthan P N, Zhou M (2018) Solving traffic signal scheduling problems in heterogeneous traffic network by using meta-heuristics. IEEE Trans Intell Transp Syst 20 (9):3272–3282

    Article  Google Scholar 

  3. Wei H, Zheng G, Gayah V, Li Z (2019) A survey on traffic signal control methods. arXiv:1904.08117

  4. Deng L Y, Liang H C, Wang C-T, Wang C-S, Hung L-P (2005) The development of the adaptive traffic signal control system. In: 11th International conference on parallel and distributed systems (ICPADS’05), vol 2. IEEE, pp 634–638

  5. Zhang Y, Zhou Y (2018) Distributed coordination control of traffic network flow using adaptive genetic algorithm based on cloud computing. J Netw Comput Appl 119:110–120

    Article  Google Scholar 

  6. Qiao Z, Ke L, Zhang G, Wang X (2021) Adaptive collaborative optimization of traffic network signal timing based on immune-fireworks algorithm and hierarchical strategy. Appl Intell. https://doi.org/10.1007/s10489-021-02256-yhttps://doi.org/10.1007/s10489-021-02256-y

  7. Yu X, Qiao Y, Li Q, Xu G, Kang C, Estevez C, Deng C, Wang S (2020) Parallelizing comprehensive learning particle swarm optimization by open computing language on an integrated graphical processing unit. Complexity

  8. Zhang Y, Zhou Y, Lu H, Fujita H (2021) Spark cloud-based parallel computing for traffic network flow predictive control using non-analytical predictive model. IEEE Trans Intell Transp Syst

  9. Zhang B, Zheng Y-J, Zhang M-X, Chen S-Y (2015) Fireworks algorithm with enhanced fireworks interaction. IEEE/ACM Trans Comput Biol Bioinform 14(1):42–55

    Article  Google Scholar 

  10. Sutton R S, Barto A G (2018) Reinforcement learning: An introduction. MIT press

  11. Wiering MA, Veenen J , Vreeken J, Koopman A (2004) Intelligent traffic light control. Utrecht University: Information and Computing Sciences

  12. Prashanth LA, Bhatnagar S (2010) Reinforcement learning with function approximation for traffic signal control. IEEE Trans Intell Transp Syst 12(2):412–421

    Google Scholar 

  13. Ozan C, Baskan O, Haldenbilen S, Ceylan H (2015) A modified reinforcement learning algorithm for solving coordinated signalized networks. Transp Res Part C: Emerging Technol 54:40–55

    Article  Google Scholar 

  14. El-Tantawy S, Abdulhai B, Abdelgawad H (2013) Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Trans Intell Transp Syst 14(3):1140–1150

    Article  Google Scholar 

  15. Zhang Y, Zhou Y, Lu H, Fujita H (2020) Traffic network flow prediction using parallel training for deep convolutional neural networks on spark cloud. IEEE Trans Ind Inf 16(12):7369–7380

    Article  Google Scholar 

  16. Zhao L, Zhou Y, Lu H, Fujita H (2019) Parallel computing method of deep belief networks and its application to traffic flow prediction. Knowl-Based Syst 163:972–987

    Article  Google Scholar 

  17. Arulkumaran K, Deisenroth M P, Brundage M, Bharath A A (2017) Deep reinforcement learning: A brief survey. IEEE Signal Proc Mag 34(6):26–38

    Article  Google Scholar 

  18. François-Lavet V, Henderson P, Islam R, Bellemare M G, Pineau J (2018) An introduction to deep reinforcement learning. arXiv:1811.12560

  19. Wang S, Liu H, Gomes P H, Krishnamachari B (2018) Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Trans Cogn Commun Netw 4(2):257–265

    Article  Google Scholar 

  20. Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905

  21. Zhang Y, Zhou Y, Lu H, Fujita H (2021) Cooperative multi-agent actor–critic control of traffic network flow based on edge computing. Futur Gener Comput Syst 123:128–141

    Article  Google Scholar 

  22. Casas N (2017) Deep deterministic policy gradient for urban traffic light control. arXiv:1703.09035

  23. Zhang F, Li J, Li Z (2020) A td3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment. Neurocomputing 411:206–215

    Article  Google Scholar 

  24. Ceylan H, Bell MGH (2004) Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing. Transp Res B Methodol 38(4):329–342

    Article  Google Scholar 

  25. Wei H, Zheng G, Yao H, Li Z (2018) Intellilight: A reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 2496–2505

  26. Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746-752):2

    Google Scholar 

  27. Shamshirband S (2012) A distributed approach for coordination between traffic lights based on game theory. Int Arab J Inf Technol 9(2):148–153

    Google Scholar 

  28. Arel I, Liu C, Urbanik T, Kohls A G (2010) Reinforcement learning-based multi-agent system for network traffic. IET Intell Transp Syst 4(2):128–135

    Article  Google Scholar 

  29. Wiering M, Vreeken J, Van Veenen J, Koopman A (2004) Simulation and optimization of traffic in a city. In: IEEE Intelligent Vehicles Symposium, 2004. IEEE, pp 453–458

  30. Salkham A , Cunningham R, Garg A, Cahill V (2008) A collaborative reinforcement learning approach to urban traffic control optimization. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol 2. IEEE, pp 560–566

  31. Aziz HM, Feng Z, Ukkusuri S V (2013) Reinforcement learning-based signal control using r-markov average reward technique (rmart) accounting for neighborhood congestion information sharing. Technical report

  32. Wang X, Ke L, Qiao Z, Chai X (2020) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans Cybern

  33. Nguyen H D, Tran K P, Heuchenne C (2019) Monitoring the ratio of two normal variables using variable sampling interval exponentially weighted moving average control charts. Qual Reliab Eng Int 35(1):439–460

    Article  Google Scholar 

  34. Pan L, Cai Q, Huang L (2020) Softmax deep double deterministic policy gradients. Adv Neural Inf Process Syst 33

  35. Domb C (2000) Phase transitions and critical phenomena. Elsevier

  36. Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 5571–5580

  37. Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  38. Cai Q, Yang Z, Lee J D, Wang Z (2019) Neural temporal-difference learning converges to global optima. In: Advances in Neural Information Processing Systems, pp 11315–11326

  39. Sadhu A K, Konar A (2018) An efficient computing of correlated equilibrium for cooperative q-learning-based multi-robot planning. IEEE Transactions on Systems, Man, and Cybernetics: Systems

  40. Alshehri A, Badawy A-H A, Huang H (2020) Fq-ago: Fuzzy logic q-learning based asymmetric link aware and geographic opportunistic routing scheme for manets. Electronics 9(4):576

    Article  Google Scholar 

  41. Abed-Alguni B H, Paul D J, Chalup S K, Henskens F A (2016) A comparison study of cooperative q-learning algorithms for independent learners. Int J Artif Intell 14(1):71–93

    Google Scholar 

  42. Banerjee D, Sen S (2007) Reaching pareto-optimality in prisoner dilemma using conditional joint action learning. Auton Agent Multi-Agent Syst 15(1):91–108

    Article  Google Scholar 

  43. Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: An overview. Innov Multi-Agent Syst Appl-1, pp 183–221

  44. Agogino A K, Tumer K (2008) Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Auton Agent Multi-Agent Syst 17(2):320–338

    Article  Google Scholar 

  45. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv:1706.02275

  46. Sutandi A C (2020) Advanced traffic control systems: Performance evaluation in a developing country. LAP Lambert Academic Publishing

  47. Chu T, Wang J, Codecà L, Li Z (2019) Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61973244, 72001214).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liangjun Ke.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, Z., Ke, L. & Wang, X. Traffic signal control using a cooperative EWMA-based multi-agent reinforcement learning. Appl Intell 53, 4483–4498 (2023). https://doi.org/10.1007/s10489-022-03643-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03643-9

Keywords

Navigation