Skip to main content

Intelligent Traffic Light via Policy-based Deep Reinforcement Learning


Intelligent traffic lights in smart cities can optimally reduce traffic congestion. In this study, we employ reinforcement learning to train the control agent of a traffic light on a simulator of urban mobility. As a difference from existing works, a policy-based deep reinforcement learning method, Proximal Policy Optimization (PPO), is utilized rather than value-based methods such as Deep Q Network (DQN) and Double DQN (DDQN). First, the obtained optimal policy from PPO is compared to those from DQN and DDQN. It is found that the policy from PPO performs better than the others. Next, instead of fixed-interval traffic light phases, we adopt light phases with variable time intervals, which result in a better policy to pass the traffic flow. Then, the effects of environment and action disturbances are studied to demonstrate that the learning-based controller is robust. Finally, we consider unbalanced traffic flows and find that an intelligent traffic light can perform moderately well for the unbalanced traffic scenarios, although it learns the optimal policy from the balanced traffic scenarios only.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Data availability

The simulation videos can be found on the website link provided in the manuscript. Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.





  1. INRIX: Congestion Costs Each American 97 hours, $1,348 A Year - INRIX. Accessed October 5, 2021.

  2. Zhang, K., Batterman, S.: Air pollution and health risks due to vehicle traffic. Sci Total Environ. 450–451, 307–316 (2013).

    Article  Google Scholar 

  3. Bharadwaj, S., Ballare, S., Rohit, Chandel, M.K.: Impact of congestion on greenhouse gas emissions for road transport in Mumbai metropolitan region. Transp Res Procedia. 25, 3538–3551 (2017).

    Article  Google Scholar 

  4. Miller, A.J.: Settings for Fixed-Cycle Traffic Signals. J Oper Res Soc. 14(4), 386 (1963).

    Article  Google Scholar 

  5. Cools, S.B., Gershenson, C., D’Hooghe, B.: Self-Organizing Traffic Lights: A Realistic Simulation. In: Prokopenko M, ed. Advanced Information and Knowledge Processing. Springer, London; 45–55. (2013).

  6. Zhou, B., Cao, J., Wu, H.: Adaptive traffic light control of multiple intersections in WSN-based ITS. IEEE Veh Technol Conf. (2011).

    Article  Google Scholar 

  7. Miao, L., Leitner, D.: Adaptive Traffic Light Control with Quality-of-Service Provisioning for Connected and Automated Vehicles at Isolated Intersections. IEEE Access. 9, 39897–39909 (2021).

    Article  Google Scholar 

  8. Dimitrov, S.: Optimal Control of Traffic Lights in Urban Area. 2020 Int Conf Autom Informatics, ICAI 2020 - Proc. October 2020.

  9. Xiao, S., Hu, R., Li, Z., Attarian, S., Björk, K.-M., Lendasse, A.: A machine-learning-enhanced hierarchical multiscale method for bridging from molecular dynamics to continua. Neural Comput Appl. 32(18), 14359–14373 (2020).

    Article  Google Scholar 

  10. Cai, M., Hasanbeig, M., Xiao, S., Abate, A., Kan, Z. Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic. IEEE Robot. Autom. Lett. 6(4):7973–7980. (2021). Accessed April 8, 2021

  11. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, London (2018)

    MATH  Google Scholar 

  12. Bingham, E.: Reinforcement learning in neurofuzzy traffic signal control. Eur J Oper Res. 131(2), 232–241 (2001).

    Article  MATH  Google Scholar 

  13. Kuyer, L., Whiteson, S., Bakker, B., Vlassis, N.: Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs. Lect Notes Comput Sci. 5211, 656–671 (2008).

    Article  Google Scholar 

  14. Li, L., Lv, Y., Wang, F.Y.: Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin. 3(3), 247–254 (2016).

    MathSciNet  Article  Google Scholar 

  15. Wei, H., Yao, H., Zheng, G., Li, Z.: IntelliLight: A reinforcement learning approach for intelligent traffic light control. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. 2496–2505. (2018).

  16. Wu, T., Zhou, P., Liu, K., et al.: Multi-Agent Deep Reinforcement Learning for Urban Traffic Light Control in Vehicular Networks. IEEE Trans Veh Technol. 69(8), 8243–8256 (2020).

    Article  Google Scholar 

  17. Wang, Y., Xu, T., Niu, X., Tan, C., Chen, E., Xiong, H.: STMARL: A Spatio-Temporal Multi-Agent Reinforcement Learning Approach for Cooperative Traffic Light Control. IEEE Trans Mob Comput. 1–1 (2020).

  18. Chen, C., Wei, H., Xu, N., et al.: Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. Proc AAAI Conf Artif Intell. 34(04), 3414–3421 (2020).

    Article  Google Scholar 

  19. Wei, H., Xu, N., Zhang, H., et al.: Colight: Learning network-level cooperation for traffic signal control. Int Conf Inf Knowl Manag Proc. 1913–1922. (2019).

  20. Lopez, P.A., Behrisch, M., Bieker-Walz, L., et al.: Microscopic Traffic Simulation using SUMO. IEEE Conf Intell Transp Syst Proceedings, ITSC. 2018, 2575–2582 (2018).

    Article  Google Scholar 

  21. Liang, X., Du, X., Wang, G., Han, Z.: A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Trans Veh Technol. 68(2), 1243–1253 (2019).

    Article  Google Scholar 

  22. Nishi, T., Otaki, K., Hayakawa, K., Yoshimura, T.: Traffic Signal Control Based on Reinforcement Learning with Graph Convolutional Neural Nets. IEEE Conf Intell Transp Syst Proceedings, ITSC. 2018, 877–883 (2018).

    Article  Google Scholar 

  23. Watkins, C., Dayan, P.: Q-Learning. Mach Learn. 8, 279–292 (1992).

    Article  MATH  Google Scholar 

  24. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing Atari with Deep Reinforcement Learning. (2013). Accessed September 19, 2021

  25. Hasselt H van, Guez, A., Silver, D.: Deep Reinforcement Learning with Double Q-Learning. In: AAAI’16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 30, 2094-2100 (2016)

  26. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc IEEE. 86(11), 2278–2324 (1998).

    Article  Google Scholar 

  27. Nair, V., Hinton, G.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. 32, 807–814 (2010)

  28. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn. 8(3), 293–321 (1992).

    Article  Google Scholar 

  29. Kakade, S., Langford, J.: Approximately optimal approximate reinforcement learning. In: In Proc. 19th International Conference on Machine Learning (2002)

  30. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous Methods for Deep Reinforcement Learning. In: Balcan MF, Weinberger KQ, eds. Proceedings of The 33rd International Conference on Machine Learning. Vol 48. Proceedings of Machine Learning Research. New York, New York, USA: PMLR:1928–1937. (2016). Accessed November 23, 2020

  31. Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. (2016). Accessed November 23, 2020

  32. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. 32nd Int Conf Mach Learn ICML 2015. 3,1889–1897. (2015). Accessed November 23, 2020

  33. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv. (2017). Accessed November 23, 2020

Download references


Xiao thanks the supports from the US Department of Education (ED#P116S210005).


The US Department of Education (ED#P116S210005).

Author information

Authors and Affiliations



YZ initiated the study, carried out methodology development and simulations, and drafted the initial manuscript. MC conceived of the study and helped to draft the manuscript. CWS conceived of the study and helped to draft the manuscript. JL helped to draft the manuscript. SX supervised the study and was a major contributor in finalizing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shaoping Xiao.

Ethics declarations

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

All authors have approved the manuscript and gave their consent for submission and publication.

Competing Interests

The authors declare no competing financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Cai, M., Schwarz, C.W. et al. Intelligent Traffic Light via Policy-based Deep Reinforcement Learning. Int. J. ITS Res. (2022).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Traffic light control
  • Reinforcement learning
  • Policy