Advertisement

Targeted Knowledge Transfer for Learning Traffic Signal Plans

  • Nan Xu
  • Guanjie Zheng
  • Kai Xu
  • Yanmin ZhuEmail author
  • Zhenhui Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11440)

Abstract

Traffic signal control in cities today is not well optimized according to the feedback received from the real world. And such an inefficiency in traffic signal control results in people’s waste of time in commuting, road rage in the traffic jam, and high cost for city operation. Recently, deep reinforcement learning (DRL) approaches shed lights to better optimize traffic signal plans according to the feedback received from the environment. Most of these methods are evaluated in a simulated environment, but can not be applied to intersections in the real world directly, as the training of DRL relies on a great amount of samples and takes a long time to converge. In this paper, we propose a batch learning framework where the targeted transfer reinforcement learning (TTRL-B) is introduced to speed up learning. Specifically, a separate unsupervised method is designed to measure the similarities of traffic conditions to select the suitable source intersection for transfer. The proposed framework allows batch learning and this is the first work to consider the impact of slow learning in RL on real-world applications. Experiments on real traffic data demonstrate that our model accelerates learning with good performance.

Keywords

Deep reinforcement learning Transfer learning Traffic signal control 

References

  1. 1.
    Bakker, B., Whiteson, S., Kester, L., Groen, F.C.: Traffic light control by multiagent reinforcement learning systems. In: Babuška, R., Groen, F.C.A. (eds.) Interactive Collaborative Information Systems. SCI, vol. 281, pp. 475–510. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11688-9_18CrossRefGoogle Scholar
  2. 2.
    Cools, S.B., Gershenson, C., D’Hooghe, B.: Self-organizing traffic lights: a realistic simulation. In: Prokopenko, M. (ed.) Advances in Applied Self-Organizing Systems. AI&KP, pp. 45–55. Springer, London (2013).  https://doi.org/10.1007/978-1-4471-5113-5_3CrossRefGoogle Scholar
  3. 3.
    Du, Y., Gabriel, V., Irwin, J., Taylor, M.E.: Initial progress in transfer for deep reinforcement learning algorithms. In: Proceedings of Deep Reinforcement Learning: Frontiers and Challenges Workshop, New York City, NY, USA (2016)Google Scholar
  4. 4.
    Gao, J., Shen, Y., Liu, J., Ito, M., Shiratori, N.: Adaptive traffic signal control: deep reinforcement learning algorithm with experience replay and target network. arXiv preprint arXiv:1705.02755 (2017)
  5. 5.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  7. 7.
    Hunt, P., Robertson, D., Bretherton, R., Winton, R.: Scoot - a traffic responsive method of coordinating signals. Technical report (1981)Google Scholar
  8. 8.
    Liu, M., Deng, J., Xu, M., Zhang, X., Wang, W.: Cooperative deep reinforcement learning for traffic signal control (2017)Google Scholar
  9. 9.
    Lowrie, P.: SCATS, Sydney co-ordinated adaptive traffic system: a traffic responsive method of controlling urban traffic (1990)Google Scholar
  10. 10.
    Lu, W., Hou, J., Yan, Y., Zhang, M., Du, X., Moscibroda, T.: MSQL: efficient similarity search in metric spaces using SQL. The VLDB J.-Int. J. Very Large Data Bases 26(6), 829–854 (2017)CrossRefGoogle Scholar
  11. 11.
    Miller, A.J.: Settings for fixed-cycle traffic signals. J. Oper. Res. Soc. 14(4), 373–386 (1963)CrossRefGoogle Scholar
  12. 12.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  13. 13.
    Mousavi, S.S., Schukat, M., Howley, E.: Traffic light control using deep policy-gradient and value-function-based reinforcement learning. Intell. Transp. Syst. (ITS) 11(7), 417–423 (2017)CrossRefGoogle Scholar
  14. 14.
    Pan, S.J., Yang, Q., et al.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  15. 15.
    Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)
  16. 16.
    Reynolds, D.: Gaussian mixture models. In: Li, S.Z., Jain, A. (eds.) Encyclopedia of Biometrics, pp. 827–832. Springer, Boston (2015).  https://doi.org/10.1007/978-0-387-73003-5_196CrossRefGoogle Scholar
  17. 17.
    Roess, R.P., Prassas, E.S., McShane, W.R.: Traffic Engineering. Pearson/Prentice Hall, Upper Saddle River (2004)Google Scholar
  18. 18.
    Rusu, A.A., et al.: Policy distillation. arXiv preprint arXiv:1511.06295 (2015)
  19. 19.
    Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(July), 1633–1685 (2009)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Tessler, C., Givony, S., Zahavy, T., Mankowitz, D.J., Mannor, S.: A deep hierarchical approach to lifelong learning in minecraft. In: AAAI, vol. 3, p. 6 (2017)Google Scholar
  21. 21.
    Thrun, S., Pratt, L.: Learning to Learn. Springer, New York (2012).  https://doi.org/10.1007/978-1-4615-5529-2CrossRefzbMATHGoogle Scholar
  22. 22.
    Wang, J.X., et al.: Learning to reinforcement learn. arXiv preprint arXiv:1611.05763 (2016)
  23. 23.
    Webster, F.V.: Traffic signal settings. Technical report (1958)Google Scholar
  24. 24.
    Wei, H., Zheng, G., Yao, H., Li, Z.: IntelliLight: a reinforcement learning approach for intelligent traffic light control. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pp. 2496–2505 (2018)Google Scholar
  25. 25.
    Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), pp. 1151–1158 (2000)Google Scholar
  26. 26.
    Zhang, Z., Huang, K., Tan, T.: Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 3, pp. 1135–1138. IEEE (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Nan Xu
    • 1
  • Guanjie Zheng
    • 2
  • Kai Xu
    • 3
  • Yanmin Zhu
    • 1
    Email author
  • Zhenhui Li
    • 2
  1. 1.Shanghai Jiao Tong UniversityShanghaiChina
  2. 2.Pennsylvania State UniversityUniversity ParkUSA
  3. 3.Shanghai Tianrang Intelligent Technology Co., LtdShanghaiChina

Personalised recommendations