Skip to main content
Log in

Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recent years have witnessed rapid development of autonomous driving. Model-based and model-free reinforcement learning are two popular learning methods for autonomous driving. However, these two kinds of methods have their own advantages in achieving excellent driving experience. In order to improve their efficiency and performance, Dyna framework is an promising way to combine their advantages. Unfortunately, the classical Dyna framework can not deal with the continuous actions in reinforcement learning. In addition, the interaction between the world model and the model-free reinforcement learning agent remains at the unidirectional data level. To further improve the effectiveness and efficiency of driving policy learning, we propose a novel Gaussian Process based Dyna-PPO approach in this paper. The Gaussian Process model, which is analytically tractable and fits for small-sample problems, is introduced to build the world model. In addition, we design a mechanism to realize bidirectional interaction between the world model and the policy model. Extensive experiments validate the effectiveness and robustness of our proposed approach. According to our simulation result, the driving distance of the vehicle could be improved by approximately 0.2× times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Hoel C-J, Tram T, Sjöberg J (2020) Reinforcement learning with uncertainty estimation for tactical decision-making in intersections. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC), pp 1–7

  2. Isele D, Rahimi R, Cosgun A, Subramanian K, Fujimura K (2018) Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 2034–2039

  3. Szilárd A (2022) Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans Intell Transp Syst 23(2):740–759

    Article  Google Scholar 

  4. Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE symposium series on computational intelligence (SSCI), IEEE, pp 737–744

  5. Ravi Kiran B, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Trans Syst 23(6):4909–4926

    Article  Google Scholar 

  6. Wang J, Zhang Q, Zhao D, Chen Y (2019) Lane change decision-making through deep reinforcement learning with rule-based constraints. In: 2019 international joint conference on neural networks (IJCNN), IEEE, pp 1–6

  7. Kuutti S, Bowden R, Fallah S (2021) Weakly supervised reinforcement learning for autonomous highway driving via virtual safety cages. Sensors 21(6):2032

    Article  Google Scholar 

  8. Liu T, Huang B, Deng Z, Wang H, Tang X, Wang X, Cao D (2020) Heuristics-oriented overtaking decision making for autonomous vehicles using reinforcement learning. IET Electrical Systems Transportation 10(4):417–424

    Article  Google Scholar 

  9. Hoel Carl-Johan, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2019) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5(2):294–305

    Article  Google Scholar 

  10. Lee H, Kim N, Cha SW (2020) Model-based reinforcement learning for eco-driving control of electric vehicles, vol 8

  11. Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Machine learning proceedings 1990, Elsevier, pp 216–224

  12. Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning, pp 968–975

  13. Peng B, Li X, Gao JL, Wong K-F, Su S-Y (2018) Deep dyna-q: integrating planning for task-completion dialogue policy learning. arXiv:1801.06176

  14. Wang F, Gao J, Li M, Zhao L (2020) Autonomous pev charging scheduling using dyna-q reinforcement learning. IEEE Trans Veh Technol 69(11):12609–12620

    Article  Google Scholar 

  15. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  16. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  17. Fang W, Zhang S, Huang H, Dang S, Huang Z, Li W, Wang Z, Sun T, Li H (2020) Learn to make decision with small data for autonomous driving deep Gaussian process and feedback control. J Adv Trans 2020:

  18. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, pp 1582–1591

  19. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971

  20. Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. Technical report

  21. Lange S, Riedmiller M, Voigtländer A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8

  22. April Y, Palefsky-Smith R, Bedi R (2016) Deep reinforcement learning for simulated autonomous vehicle control. Course Project Reports: Winter 2016

  23. Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI conference on artificial intelligence, vol 33 pp 4213–4220

  24. Youssef F, Houda B (2020) Comparative study of end-to-end deep learning methods for self-driving car. Int J Intell Syst Appl 12(5):15–27

    Google Scholar 

  25. Han X, Bao H, Liang J, Pan F, Xuan ZX (2018) An adaptive cruise control algorithm based on deep reinforcement learning. Comput Eng 44(7):32–41

    Google Scholar 

  26. Zong X, Guoyan X u, Guizhen Y u, Hongjie S u, Chaowei H u (2018) Obstacle avoidance for self-driving vehicle with reinforcement learning. SAE Int J Passenger Cars-Electron Electr Syst 11(1):28–38

    Google Scholar 

  27. Gao M, Chang DE (2021) Autonomous driving based on modified sac algorithm through imitation learning pretraining. In: 2021 21st international conference on control, automation and systems (ICCAS), pp 1360–1364

  28. Shah S, Dey D, Lovett C, Kapoor A (2018) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field and service robotics. Springer, pp 621–635

  29. Savari M, Choe Y (2022) Utilizing human feedback in autonomous driving: discrete vs. continuous. Machines 10(8):609

    Article  Google Scholar 

  30. Pei X, Mo S, Chen Z, Bo Y (2020) Research on lane changing of autonomous vehicle based on td3 algorithm in complex road environment. Zhongguo Gonglu Xuebao/China Journal of Highway and Transport :10

  31. Chen I-M, Chan C-Y (2021) Deep reinforcement learning based path tracking controller for autonomous vehicle. Proc IME D J Automob Eng 235(2-3):541–551

    Article  Google Scholar 

  32. Saxena DM, Bae S, Nakhaei A, Fujimura K, Likhachev M (2020) Driving in dense traffic with model-free reinforcement learning. In: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, pp 5385–5392

  33. Coad J, Qiao Z, Dolan JM (2020) Safe trajectory planning using reinforcement learning for self driving. arXiv:2011.04702"

  34. Tang C, Zhuo X u, Tomizuka M (2020) Disturbance-observer-based tracking controller for neural network driving policy transfer. IEEE Trans Intell Transp Syst 21(9):3961–3972

    Article  Google Scholar 

  35. Pan X, Chen X, Cai Q, Canny J, Fisher Y u (2019) Semantic predictive control for explainable and efficient policy learning. In: 2019 international conference on robotics and automation (ICRA), pp 3203–3209

  36. Xu Z, Chen J, Tomizuka M (2020) Guided policy search model-based reinforcement learning for urban autonomous driving

  37. Hewing L, Liniger A, Zeilinger MN (2018) Cautious nmpc with gaussian process dynamics for autonomous miniature race cars. In: 2018 European control conference (ECC), IEEE, pp 1341–1348

  38. Hewing L, Kabzan J, Zeilinger MN (2019) Cautious model predictive control using gaussian process regression. IEEE Trans Control Syst Technol 28(6):2736–2743

    Article  Google Scholar 

  39. Xiao Z, Dai B, Li H, Wu T, Xu X, Zeng Y, Chen T (2017) Gaussian process regression-based robust free space detection for autonomous vehicle by 3-d point cloud and 2-d appearance information fusion. Int J Adv Robot Syst 14(4):1729881417717058

    Article  Google Scholar 

  40. Yuan Y, Zhang Z, Yang XT (2020) Highway traffic state estimation using physics regularized gaussian process: discretized formulation. arXiv:2007.07762

  41. Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull 2(4):160–163

    Article  Google Scholar 

  42. Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning, ICML ’08. Association for Computing Machinery, New York, pp 968–975

  43. Peng B, Li X, Gao J, Liu J, Kam-Fai W (2018) DEep Dyna-Q: integrating planning for task-completion dialogue policy learning. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, pp 2182–2192

  44. Su S-Y, Li X, Gao J, Liu J, Chen Y-N (2018) Discriminative deep Dyna-Q: robust planning for dialogue policy learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, pp 3813–3823

  45. Hassanien AE, Mononteliza J (2020) Autonomous driving path planning based on sarsa-dyna algorithm. Asia-pacific J Convergent Res Interchange 6(7):59–70

    Article  Google Scholar 

  46. Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning, Springer, pp 63–71

  47. Williams C, Bonilla EV, Chai KM (2007) Multi-task gaussian process prediction. Adv Neural Inf Process Syst :153–160

  48. Duvenaud D (2014) Automatic model construction with Gaussian processes. University of Cambridge, PhD thesis

    Google Scholar 

  49. Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6(Dec):1939–1959

    MathSciNet  MATH  Google Scholar 

  50. Bui TD, Yan J, Turner RE (2017) A unifying framework for gaussian process pseudo-point approximations using power expectation propagation. J Mach Learn Res 18(1):3649–3720

    MathSciNet  MATH  Google Scholar 

  51. Solin A, Särkkä S (2020) Hilbert space methods for reduced-rank gaussian process regression. Stat Comput 30(2):419–446

    Article  MathSciNet  MATH  Google Scholar 

  52. Lázaro-Gredilla M, Quinonero-Candela J, Rasmussen CE, Figueiras-Vidal AR (2010) Sparse spectrum gaussian process regression. J Mach Learn Res 11:1865–1881

    MathSciNet  MATH  Google Scholar 

  53. Su S-Y, Li X, Gao J, Liu J, Chen Y-N (2018) Discriminative deep dyna-q: robust planning for dialogue policy learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3813–3823

  54. Wu Y, Li X, Liu J, Gao J, Yang Y (2019) Switch-based active deep dyna-q: efficient adaptive planning for task-completion dialogue policy learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33. pp 7289–7296

  55. Jadon S (2020) A survey of loss functions for semantic segmentation. In: 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), IEEE, pp 1–7

  56. Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350

    Article  MathSciNet  MATH  Google Scholar 

  57. Huk M (2020) Stochastic optimization of contextual neural networks with rmsprop. In: Asian conference on intelligent information and database systems, Springer, pp 343–352

  58. Kingma DP, Adam JB (2014) A method for stochastic optimization. arXiv:1412.6980

  59. Ketkar N (2017) Stochastic gradient descent. In: Deep learning with Python, Springer, pp 113–132

  60. Kingma DP, Welling M (2014) Auto-encoding variational bayes. Stat 1050:1

    MATH  Google Scholar 

  61. Gardner JR, Pleiss G, Bindel D, Weinberger KQ, Wilson AG (2018) Gpytorch: blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems

  62. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym

  63. Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16

  64. Sanders A (2016) An introduction to unreal engine 4. CRC Press

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62002369, in part by the Scientific Research Project of National University of Defense Technology through grant ZK19-03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenqi Fang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guanlin Wu, Wenqi Fang and Ji Wang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, G., Fang, W., Wang, J. et al. Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving. Appl Intell 53, 16893–16907 (2023). https://doi.org/10.1007/s10489-022-04354-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04354-x

Keywords

Navigation