Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving

Wu, Guanlin; Fang, Wenqi; Wang, Ji; Ge, Pin; Cao, Jiang; Ping, Yang; Gou, Peng

doi:10.1007/s10489-022-04354-x

Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving

Published: 17 December 2022

Volume 53, pages 16893–16907, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Guanlin Wu^1,2,
Wenqi Fang ORCID: orcid.org/0000-0003-3347-0511³,
Ji Wang²,
Pin Ge³,
Jiang Cao¹,
Yang Ping¹ &
…
Peng Gou³

750 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Recent years have witnessed rapid development of autonomous driving. Model-based and model-free reinforcement learning are two popular learning methods for autonomous driving. However, these two kinds of methods have their own advantages in achieving excellent driving experience. In order to improve their efficiency and performance, Dyna framework is an promising way to combine their advantages. Unfortunately, the classical Dyna framework can not deal with the continuous actions in reinforcement learning. In addition, the interaction between the world model and the model-free reinforcement learning agent remains at the unidirectional data level. To further improve the effectiveness and efficiency of driving policy learning, we propose a novel Gaussian Process based Dyna-PPO approach in this paper. The Gaussian Process model, which is analytically tractable and fits for small-sample problems, is introduced to build the world model. In addition, we design a mechanism to realize bidirectional interaction between the world model and the policy model. Extensive experiments validate the effectiveness and robustness of our proposed approach. According to our simulation result, the driving distance of the vehicle could be improved by approximately 0.2× times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Advances in Unmanned Aerial Vehicles: A Review

Article 25 April 2022

UAV Path Planning Using Optimization Approaches: A Survey

Article 18 April 2022

Cooperative Merging Strategy Considering Stochastic Driving Style at on-Ramps: A Bayesian Game Approach

Article 30 April 2024

References

Hoel C-J, Tram T, Sjöberg J (2020) Reinforcement learning with uncertainty estimation for tactical decision-making in intersections. In: 2020 IEEE 23rd international conference on intelligent transportation systems (ITSC), pp 1–7
Isele D, Rahimi R, Cosgun A, Subramanian K, Fujimura K (2018) Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 2034–2039
Szilárd A (2022) Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans Intell Transp Syst 23(2):740–759
Article Google Scholar
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE symposium series on computational intelligence (SSCI), IEEE, pp 737–744
Ravi Kiran B, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Trans Syst 23(6):4909–4926
Article Google Scholar
Wang J, Zhang Q, Zhao D, Chen Y (2019) Lane change decision-making through deep reinforcement learning with rule-based constraints. In: 2019 international joint conference on neural networks (IJCNN), IEEE, pp 1–6
Kuutti S, Bowden R, Fallah S (2021) Weakly supervised reinforcement learning for autonomous highway driving via virtual safety cages. Sensors 21(6):2032
Article Google Scholar
Liu T, Huang B, Deng Z, Wang H, Tang X, Wang X, Cao D (2020) Heuristics-oriented overtaking decision making for autonomous vehicles using reinforcement learning. IET Electrical Systems Transportation 10(4):417–424
Article Google Scholar
Hoel Carl-Johan, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2019) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5(2):294–305
Article Google Scholar
Lee H, Kim N, Cha SW (2020) Model-based reinforcement learning for eco-driving control of electric vehicles, vol 8
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Machine learning proceedings 1990, Elsevier, pp 216–224
Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning, pp 968–975
Peng B, Li X, Gao JL, Wong K-F, Su S-Y (2018) Deep dyna-q: integrating planning for task-completion dialogue policy learning. arXiv:1801.06176
Wang F, Gao J, Li M, Zhao L (2020) Autonomous pev charging scheduling using dyna-q reinforcement learning. IEEE Trans Veh Technol 69(11):12609–12620
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Article Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Fang W, Zhang S, Huang H, Dang S, Huang Z, Li W, Wang Z, Sun T, Li H (2020) Learn to make decision with small data for autonomous driving deep Gaussian process and feedback control. J Adv Trans 2020:
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, pp 1582–1591
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2018) Soft actor-critic algorithms and applications. Technical report
Lange S, Riedmiller M, Voigtländer A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
April Y, Palefsky-Smith R, Bedi R (2016) Deep reinforcement learning for simulated autonomous vehicle control. Course Project Reports: Winter 2016
Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI conference on artificial intelligence, vol 33 pp 4213–4220
Youssef F, Houda B (2020) Comparative study of end-to-end deep learning methods for self-driving car. Int J Intell Syst Appl 12(5):15–27
Google Scholar
Han X, Bao H, Liang J, Pan F, Xuan ZX (2018) An adaptive cruise control algorithm based on deep reinforcement learning. Comput Eng 44(7):32–41
Google Scholar
Zong X, Guoyan X u, Guizhen Y u, Hongjie S u, Chaowei H u (2018) Obstacle avoidance for self-driving vehicle with reinforcement learning. SAE Int J Passenger Cars-Electron Electr Syst 11(1):28–38
Google Scholar
Gao M, Chang DE (2021) Autonomous driving based on modified sac algorithm through imitation learning pretraining. In: 2021 21st international conference on control, automation and systems (ICCAS), pp 1360–1364
Shah S, Dey D, Lovett C, Kapoor A (2018) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field and service robotics. Springer, pp 621–635
Savari M, Choe Y (2022) Utilizing human feedback in autonomous driving: discrete vs. continuous. Machines 10(8):609
Article Google Scholar
Pei X, Mo S, Chen Z, Bo Y (2020) Research on lane changing of autonomous vehicle based on td3 algorithm in complex road environment. Zhongguo Gonglu Xuebao/China Journal of Highway and Transport :10
Chen I-M, Chan C-Y (2021) Deep reinforcement learning based path tracking controller for autonomous vehicle. Proc IME D J Automob Eng 235(2-3):541–551
Article Google Scholar
Saxena DM, Bae S, Nakhaei A, Fujimura K, Likhachev M (2020) Driving in dense traffic with model-free reinforcement learning. In: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, pp 5385–5392
Coad J, Qiao Z, Dolan JM (2020) Safe trajectory planning using reinforcement learning for self driving. arXiv:2011.04702"
Tang C, Zhuo X u, Tomizuka M (2020) Disturbance-observer-based tracking controller for neural network driving policy transfer. IEEE Trans Intell Transp Syst 21(9):3961–3972
Article Google Scholar
Pan X, Chen X, Cai Q, Canny J, Fisher Y u (2019) Semantic predictive control for explainable and efficient policy learning. In: 2019 international conference on robotics and automation (ICRA), pp 3203–3209
Xu Z, Chen J, Tomizuka M (2020) Guided policy search model-based reinforcement learning for urban autonomous driving
Hewing L, Liniger A, Zeilinger MN (2018) Cautious nmpc with gaussian process dynamics for autonomous miniature race cars. In: 2018 European control conference (ECC), IEEE, pp 1341–1348
Hewing L, Kabzan J, Zeilinger MN (2019) Cautious model predictive control using gaussian process regression. IEEE Trans Control Syst Technol 28(6):2736–2743
Article Google Scholar
Xiao Z, Dai B, Li H, Wu T, Xu X, Zeng Y, Chen T (2017) Gaussian process regression-based robust free space detection for autonomous vehicle by 3-d point cloud and 2-d appearance information fusion. Int J Adv Robot Syst 14(4):1729881417717058
Article Google Scholar
Yuan Y, Zhang Z, Yang XT (2020) Highway traffic state estimation using physics regularized gaussian process: discretized formulation. arXiv:2007.07762
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull 2(4):160–163
Article Google Scholar
Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning, ICML ’08. Association for Computing Machinery, New York, pp 968–975
Peng B, Li X, Gao J, Liu J, Kam-Fai W (2018) DEep Dyna-Q: integrating planning for task-completion dialogue policy learning. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, pp 2182–2192
Su S-Y, Li X, Gao J, Liu J, Chen Y-N (2018) Discriminative deep Dyna-Q: robust planning for dialogue policy learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, pp 3813–3823
Hassanien AE, Mononteliza J (2020) Autonomous driving path planning based on sarsa-dyna algorithm. Asia-pacific J Convergent Res Interchange 6(7):59–70
Article Google Scholar
Rasmussen CE (2003) Gaussian processes in machine learning. In: Summer school on machine learning, Springer, pp 63–71
Williams C, Bonilla EV, Chai KM (2007) Multi-task gaussian process prediction. Adv Neural Inf Process Syst :153–160
Duvenaud D (2014) Automatic model construction with Gaussian processes. University of Cambridge, PhD thesis
Google Scholar
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate gaussian process regression. J Mach Learn Res 6(Dec):1939–1959
MathSciNet MATH Google Scholar
Bui TD, Yan J, Turner RE (2017) A unifying framework for gaussian process pseudo-point approximations using power expectation propagation. J Mach Learn Res 18(1):3649–3720
MathSciNet MATH Google Scholar
Solin A, Särkkä S (2020) Hilbert space methods for reduced-rank gaussian process regression. Stat Comput 30(2):419–446
Article MathSciNet MATH Google Scholar
Lázaro-Gredilla M, Quinonero-Candela J, Rasmussen CE, Figueiras-Vidal AR (2010) Sparse spectrum gaussian process regression. J Mach Learn Res 11:1865–1881
MathSciNet MATH Google Scholar
Su S-Y, Li X, Gao J, Liu J, Chen Y-N (2018) Discriminative deep dyna-q: robust planning for dialogue policy learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3813–3823
Wu Y, Li X, Liu J, Gao J, Yang Y (2019) Switch-based active deep dyna-q: efficient adaptive planning for task-completion dialogue policy learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33. pp 7289–7296
Jadon S (2020) A survey of loss functions for semantic segmentation. In: 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), IEEE, pp 1–7
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
Article MathSciNet MATH Google Scholar
Huk M (2020) Stochastic optimization of contextual neural networks with rmsprop. In: Asian conference on intelligent information and database systems, Springer, pp 343–352
Kingma DP, Adam JB (2014) A method for stochastic optimization. arXiv:1412.6980
Ketkar N (2017) Stochastic gradient descent. In: Deep learning with Python, Springer, pp 113–132
Kingma DP, Welling M (2014) Auto-encoding variational bayes. Stat 1050:1
MATH Google Scholar
Gardner JR, Pleiss G, Bindel D, Weinberger KQ, Wilson AG (2018) Gpytorch: blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16
Sanders A (2016) An introduction to unreal engine 4. CRC Press

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62002369, in part by the Scientific Research Project of National University of Defense Technology through grant ZK19-03.

Author information

Authors and Affiliations

Academy of Military Science, Haidian, 100091, Beijing, China
Guanlin Wu, Jiang Cao & Yang Ping
National University of Defense Technology, Changsha, 410073, Hunan, China
Guanlin Wu & Ji Wang
Nanhu Laboratory, Jiaxing, 314001, ZheJiang, China
Wenqi Fang, Pin Ge & Peng Gou

Authors

Guanlin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenqi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Ji Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pin Ge
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yang Ping
View author publications
You can also search for this author in PubMed Google Scholar
Peng Gou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenqi Fang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guanlin Wu, Wenqi Fang and Ji Wang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, G., Fang, W., Wang, J. et al. Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving. Appl Intell 53, 16893–16907 (2023). https://doi.org/10.1007/s10489-022-04354-x

Download citation

Accepted: 21 November 2022
Published: 17 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04354-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

UAV Path Planning Using Optimization Approaches: A Survey

Cooperative Merging Strategy Considering Stochastic Driving Style at on-Ramps: A Bayesian Game Approach

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

UAV Path Planning Using Optimization Approaches: A Survey

Cooperative Merging Strategy Considering Stochastic Driving Style at on-Ramps: A Bayesian Game Approach

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation