Deep Reinforcement Learning

Huang, Xiaowei; Jin, Gaojie; Ruan, Wenjie

doi:10.1007/978-981-19-6814-3_13

Xiaowei Huang⁶,
Gaojie Jin⁶ &
Wenjie Ruan⁷

Part of the book series: Artificial Intelligence: Foundations, Theory, and Algorithms ((AIFTA))

Abstract

This chapter consider another important application of machine learning to robotics, i.e., the utilisation of deep reinforcement learning agent for robot motion planning and control. We will first present some preliminaries on training DRL policy in robotics, followed by the discussion on the sample efficiency concerning the sufficiency of training (Sect. 13.3) and the introduction of several statistical methods for evaluation (Sect. 13.4). Afterwards, we will discuss how to formally express the properties (Sect. 13.5) and then focus on reusing the verification tools for convolutional neural network to work with deep reinforcement learning, by considering the verification of policy generalisation (Sect. 13.6), the verification of state-based policy robustness (Sect. 13.7), and the verification of temporal policy robustness (Sect. 13.8). In addition, we will discuss how to address the well-known Sim-to-Real challenge in robotics with the verification techniques (Sect. 13.9).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Mané. Concrete problems in AI safety. CoRR, abs/1606.06565, 2016.
Google Scholar
Karol Arndt, Murtaza Hazara, Ali Ghadirzadeh, and Ville Kyrki. Meta reinforcement learning for sim-to-real domain adaptation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2725–2731, 2020.
Google Scholar
Stephanie C. Y. Chan, Sam Fishman, John F. Canny, Anoop Korattikara, and Sergio Guadarrama. Measuring the reliability of reinforcement learning algorithms. CoRR, abs/1912.05663, 2019.
Google Scholar
Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, and Wojciech Zaremba. Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint arXiv:1610.03518, 2016.
Google Scholar
Edmund M Clarke Jr, Orna Grumberg, Daniel Kroening, Doron Peled, and Helmut Veith. Model checking. The MIT Press, 2018.
Google Scholar
Christian Dehnert, Sebastian Junges, Joost-Pieter Katoen, and Matthias Volk. A Storm is coming: A modern probabilistic model checker. In Rupak Majumdar and Viktor Kunĉak, editors, Computer Aided Verification, volume 10427 of LNCS, pages 592–600, Cham, 2017. Springer.
Google Scholar
Yi Dong, Xingyu Zhao, and Xiaowei Huang. Dependability analysis of deep reinforcement learning based robotics and autonomous systems. CoRR, abs/2109.06523, 2021.
Google Scholar
Alexey Dosovitskiy, Germán Ros, Felipe Codevilla, Antonio M. López, and Vladlen Koltun. CARLA: an open urban driving simulator. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pages 1–16. PMLR, 2017.
Google Scholar
Ilenia Epifani, Carlo Ghezzi, Raffaela Mirandola, and Giordano Tamburrelli. Model evolution by run-time parameter adaptation. In Proc. of the 31st Int. Conf. on Software Engineering, ICSE ’09, pages 111–121, Washington, DC, USA, 2009. IEEE Computer Society.
Google Scholar
Fadri Furrer, Michael Burri, Markus Achtelik, and Roland Siegwart. RotorS—A Modular Gazebo MAV Simulator Framework, pages 595–625. Springer International Publishing, Cham, 2016.
Google Scholar
Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37:100270, 2020.
Article MathSciNet MATH Google Scholar
Manuel Kaspar, Juan D. Muñoz Osorio, and Juergen Bock. Sim2real transfer for reinforcement learning without dynamics randomization. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4383–4388, 2020.
Google Scholar
K. Kristinsson and G.A. Dumont. System identification and control using genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 22(5):1033–1046, 1992.
Article MATH Google Scholar
Marta Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Ganesh Gopalakrishnan and Shaz Qadeer, editors, Computer Aided Verification, volume 6806 of LNCS, pages 585–591, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
Google Scholar
Marta Kwiatkowska, Gethin Norman, and David Parker. Probabilistic Model Checking: Advances and Applications. In Rolf Drechsler, editor, Formal System Verification: State-of the-Art and Future Trends, pages 73–121. Springer, Cham, 2018.
Google Scholar
Jianlin Li, Jiangchao Liu, Pengfei Yang, Liqian Chen, Xiaowei Huang, and Lijun Zhang. Analyzing deep neural networks with symbolic propagation: Towards higher precision and faster verification. In SAS2019, 2019.
Google Scholar
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In ICLR’16, 2016.
Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
Google Scholar
Fabio Muratore, Christian Eilers, Michael Gienger, and Jan Peters. Bayesian domain randomization for sim-to-real transfer. CoRR, abs/2003.02471, 2020.
Google Scholar
Ramya Ramakrishnan, Ece Kamar, Debadeepta Dey, Eric Horvitz, and Julie Shah. Blind spot detection for safe sim-to-real transfer. J. Artif. Intell. Res., 67:191–234, 2020.
Article MathSciNet MATH Google Scholar
Robotis. Robotis(2019) turtlebot3 – e-manual, waffle pi. [Online] https://emanual.robotis.com/docs/en/platform/turtlebot3/overview/. (Accessed on 02 August 2021).
Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. Reachability analysis of deep neural networks with provable guarantees. In IJCAI, pages 2651–2659, 2018.
Google Scholar
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Marco Hutter and Roland Siegwart, editors, Field and Service Robotics, pages 621–635, Cham, 2018. Springer International Publishing.
Google Scholar
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
Google Scholar
René Traoré, Hugo Caselles-Dupré, Timothée Lesort, Te Sun, Natalia Díaz Rodríguez, and David Filliat. Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer. CoRR, abs/1906.04452, 2019.
Google Scholar
Jingkang Wang, Yang Liu, and Bo Li. Reinforcement learning with perturbed rewards. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 6202–6209. AAAI Press, 2020.
Google Scholar
Gellért Weisz, Philip Amortila, and Csaba Szepesvári. Exponential lower bounds for planning in mdps with linearly-realizable optimal action-value functions. In Vitaly Feldman, Katrina Ligett, and Sivan Sabato, editors, Algorithmic Learning Theory, 16-19 March 2021, Virtual Conference, Worldwide, volume 132 of Proceedings of Machine Learning Research, pages 1237–1264. PMLR, 2021.
Google Scholar
Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. A game-based approximate verification of deep neural networks with provable guarantees. Theor. Comput. Sci., 807:298–329, 2020.
Article MathSciNet MATH Google Scholar
Yang Yu. Towards sample efficient reinforcement learning. In IJCAI, pages 5739–5743, 2018.
Google Scholar
Wenshuai Zhao, Jorge Peña Queralta, Li Qingqing, and Tomi Westerlund. Ubiquitous distributed deep reinforcement learning at the edge: Analyzing byzantine agents in discrete action spaces. Procedia Computer Science, 177:324–329, 2020. The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020) / The 10th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH 2020) / Affiliated Workshops.
Google Scholar
Wenshuai Zhao, Jorge Peña Queralta, Li Qingqing, and Tomi Westerlund. Towards closing the sim-to-real gap in collaborative multi-robot deep reinforcement learning. CoRR, abs/2008.07875, 2020.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Liverpool, Liverpool, UK
Xiaowei Huang & Gaojie Jin
University of Exeter, Exeter, UK
Wenjie Ruan

Authors

Xiaowei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gaojie Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Ruan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, X., Jin, G., Ruan, W. (2023). Deep Reinforcement Learning. In: Machine Learning Safety. Artificial Intelligence: Foundations, Theory, and Algorithms. Springer, Singapore. https://doi.org/10.1007/978-981-19-6814-3_13

Download citation

DOI: https://doi.org/10.1007/978-981-19-6814-3_13
Published: 24 February 2012
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6813-6
Online ISBN: 978-981-19-6814-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics