Advertisement

Recursive least-squares temporal difference learning for adaptive traffic signal control at intersection

  • Biao Yin
  • Mahjoub Dridi
  • Abdellah El Moudni
Original Article

Abstract

This paper presents a new method to solve the scheduling problem of adaptive traffic signal control at intersection. The method involves recursive least-squares temporal difference (RLS-TD(λ)) learning that is integrated into approximate dynamic programming. The learning mechanism of RLS-TD(λ) is to make an adaptation of linear function approximation by updating its parameters based on environmental feedback. This study investigates the method implementation after modeling a traffic dynamic system at intersection in discrete time. In the model, different traffic control schemes regarding signal phase sequence are considered, especially the defined adaptive phase sequence (APS). By simulating traffic scenarios, RLS-TD(λ) is superior to TD(λ) for updating functional parameters in the approximation, and APS outperforms other conventional control schemes on reducing traffic delay. By comparing with other traffic signal control algorithms, the proposed algorithm yields satisfying results in terms of traffic delay and computation time.

Keywords

Adaptive traffic signal control Recursive least-squares temporal difference Approximate dynamic programming Adaptive phase sequence 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Khan SG, Herrmann G, Lewis FL, Pipe T, Melhuish C (2012) Reinforcement learning and optimal adaptive control: an overview and implementation examples. Annu Rev Control 36(1):42–59CrossRefGoogle Scholar
  2. 2.
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeGoogle Scholar
  3. 3.
    Xu X, Zuo L, Huang Z (2014) Reinforcement learning algorithms with function approximation: recent advances and applications. Inform Sci 261:1–31MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  5. 5.
    Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell M 4(2):39–47CrossRefGoogle Scholar
  6. 6.
    Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. Handbook of intelligent control: neural, fuzzy, and adaptive approaches 15:493–525Google Scholar
  7. 7.
    Cai C, Wong CK, Heydecker BG (2009) Adaptive traffic signal control using approximate dynamic programming. Transport Res Part C Emerg Technol 17(5):456–474CrossRefGoogle Scholar
  8. 8.
    Haijema R, van der Wal J (2008) An MDP decomposition approach for traffic control at isolated signalized intersections. Proba Eng Inform Sci 22(4):587–602MathSciNetzbMATHGoogle Scholar
  9. 9.
    Yu XH, Recker WW (2006) Stochastic adaptive control model for traffic signal systems. Transp Res Part C Emerg Technol 14(4):263–282CrossRefGoogle Scholar
  10. 10.
    Baird L, Moore AW (1999) Gradient descent for general reinforcement learning. In: Advances in neural information processing systems, pp 968–974Google Scholar
  11. 11.
    Tsitsiklis JN, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans Automat Contr 42(5):674–690MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Xu X, He H, Hu D (2002) Efficient reinforcement learning using recursive least-squares methods. J Artif Intell Res 16(1):259–292MathSciNetzbMATHGoogle Scholar
  13. 13.
    Ormoneit D, Sen Ś (2002) Kernel-based reinforcement learning. Mach Learn 49(2–3):161–178CrossRefzbMATHGoogle Scholar
  14. 14.
    Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1–3):33–57zbMATHGoogle Scholar
  15. 15.
    Boyan JA (2002) Technical update: least-squares temporal difference learning. Mach Learn 49(2–3):233–246CrossRefzbMATHGoogle Scholar
  16. 16.
    Hunt PB, Robertson DI, Bretherton RD, Winton RI (1981) SCOOT–a traffic responsive method of coordinating signals. Transport and Road Research Laboratory, Crowthorne, Technique ReportGoogle Scholar
  17. 17.
    Lowrie PR (1982) The Sydney coordinated adaptive traffic system-principles, methodology, algorithms. In: Proceddings of international conference on road traffic signallingGoogle Scholar
  18. 18.
    Mladenovic MN, Stevanovic A, Kosonen I, Glavic D (2015) Adaptive traffic control systems: guidelines for development of functional requirements. mobil.TUM. Munich, GermanyGoogle Scholar
  19. 19.
    Gartner NH, Pooran FJ, Andrews CM (2001) Implementation of the OPAC adaptive control strategy in a traffic signal network. In: Proceedings of IEEE conference intelligent transportation systems, pp 195–200Google Scholar
  20. 20.
    Henry J, Farges J, Tuffal J (1984) The PRODYN real time traffic algorithm. IFACIFIP-IFORS conference on control in transportation system. http://trid.trb.org/view.aspx?id=339694
  21. 21.
    Mirchandani P, Head L (2001) A real-time traffic signal control system: architecture, algorithms, and analysis. Transp Res Part C Emerg Technol 9(6):415–432CrossRefGoogle Scholar
  22. 22.
    Heung TH, Ho TK, Fung YF (2005) Coordinated road-junction traffic control by dynamic programming. IEEE Trans Intell Transp 6(3):341–350CrossRefGoogle Scholar
  23. 23.
    Wu J, Abbas-Turki A, El Moudni A (2009) Discrete methods for urban intersection traffic controlling. In Proceedings of IEEE vehicular technology conference, pp 1–5Google Scholar
  24. 24.
    Park B, Chang M (2002) Realizing benefits of adaptive signal control at an isolated intersection. Transport Res Rec 1811:115–121CrossRefGoogle Scholar
  25. 25.
    Abdulhai B, Pringle R, Karakoulas GJ (2003) Reinforcement learning for true adaptive traffic signal control. J Transp Eng-ASCE 129(3):278–285CrossRefGoogle Scholar
  26. 26.
    Lee J, Abdulhai B, Shalaby A, Chung EH (2005) Real-time optimization for adaptive traffic signal control using genetic algorithms. J Intell Transport S 9(3):111–122CrossRefzbMATHGoogle Scholar
  27. 27.
    Kergaye C, Stevanovic A, Martin PT (2010) Comparative evaluation of adaptive traffic control system assessments through field and microsimulation. J Intell Transport S 14(2):109–124CrossRefGoogle Scholar
  28. 28.
    Li L, Lv Y, Wang FY (2016) Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin 3(3):247–254MathSciNetCrossRefGoogle Scholar
  29. 29.
    Araghi S, Khosravi A, Creighton D (2015) A review on computational intelligence methods for controlling traffic signal timing. Expert Syst Appl 42(3):1538–1550CrossRefGoogle Scholar
  30. 30.
    García-Nieto J, Alba E, Carolina Olivera A (2012) Swarm intelligence for traffic light scheduling: application to real urban areas. Eng Appl Artif Intell 25(2):274–283CrossRefGoogle Scholar
  31. 31.
    Srinivasan D, Choy MC, Cheu RL (2006) Neural networks for real-time traffic signal control. IEEE Trans Intell Transp 7(3):261–272CrossRefGoogle Scholar
  32. 32.
    Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intell Transp Syst 4(2):128–135CrossRefGoogle Scholar
  33. 33.
    Bazzan ALC (2009) Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Auton Agent Multi-Agent Syst 18(3):342–375CrossRefGoogle Scholar
  34. 34.
    Box S, Waterson B (2013) An automated signalized junction controller that learns strategies by temporal difference reinforcement learning. Eng Appl Artif Intell 26(1):652–659CrossRefGoogle Scholar
  35. 35.
    Prashanth LA, Bhatnagar S (2011) Reinforcement learning with function approximation for traffic signal control. IEEE Trans Intell Transp 12(2):412–421CrossRefGoogle Scholar
  36. 36.
    El-Tantawy S, Abdulhai B, Abdelgawad H (2013) Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto. IEEE Trans Intell Transp 14(3):1140–1150CrossRefGoogle Scholar
  37. 37.
    Li T, Zhao D, Yi J (2008) Adaptive dynamic programming for multi-intersections traffic signal intelligent control. In: Proceedings of IEEE conference intelligent transportation systems, pp 286–291Google Scholar
  38. 38.
    Zhao D, Hu Z, Xia Z, Alippi C, Zhu Y, Wang D (2014) Full-range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing 125:57–67CrossRefGoogle Scholar
  39. 39.
    Huang YS, Weng YS, Zhou MC (2014) Modular design of urban traffic-light control systems based on synchronized timed Petri nets. IEEE Trans Intell Transp 15(2):530–539CrossRefGoogle Scholar
  40. 40.
    El-Tantawy S, Abdulhai B, Abdelgawad H (2014) Design of reinforcement learning parameters for seamless application of adaptive traffic signal control. J Intell Transp Syst 18(3):227–245CrossRefGoogle Scholar
  41. 41.
    Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C 38(2):156–172CrossRefGoogle Scholar
  42. 42.
    Bertsekas DP (1995) Dynamic programming and optimal control vol. 1 No 2. Athena Scientific, BelmontGoogle Scholar
  43. 43.
    Gartner NH, Tarnoff PJ, Andrews CM (1991) Evaluation of optimized policies for adaptive control strategy. Transp Res Rec 1324:105–114Google Scholar
  44. 44.
    Yin B, Dridi M, El Moudni A (2015) Forward search algorithm based on dynamic programming for real-time adaptive traffic signal control. IET Intell Transp Syst 9(7):754–764CrossRefGoogle Scholar
  45. 45.
    Khamis MA, Gomaa W (2012) Enhanced multiagent multi-objective reinforcement learning for urban traffic light control. In: Proceedings of IEEE conference machine learning and applications, pp 586–591Google Scholar
  46. 46.
    Khamis MA, Gomaa W (2014) Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework. Eng Appl Artif Intell 29:134–151CrossRefGoogle Scholar
  47. 47.
    Söderström T, Stoica P (2002) Instrumental variable methods for system identification. Circ Syst Signal Process 21(1):1–9MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.LVMT-City Mobility Transport Laboratory, École des Ponts ParisTechIFSTTAR, UPEMChamps-sur-MarneFrance
  2. 2.NIT-O2S, Université de technologie de Belfort-MontbéliardBelfortFrance

Personalised recommendations