Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance

Okawa, Yoshihiro; Sasaki, Tomotake; Yanami, Hitoshi; Namerikawa, Toru

doi:10.1007/978-3-031-26412-2_9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

545 Accesses

Abstract

Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Availability Statement

The source code to reproduce the results of this study is available at https://github.com/FujitsuResearch/SafeExploration

Notes

1.
Further comparison with other related works is given in Appendix A (electronic supplementary material).
2.
Note that the means of \({\boldsymbol{\varepsilon }}_k\) and \({\boldsymbol{w}}_k\) are assumed to be \({\boldsymbol{0}}\) and \({\boldsymbol{\mu }}_w\), respectively.

References

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 22–31 (2017)
Google Scholar
Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: Proceedings of the 18th European Control Conference, pp. 3420–3431 (2019)
Google Scholar
Berkenkamp, F., Turchetta, M., Schoellig, A., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems 30, pp. 908–919 (2017)
Google Scholar
Biyik, E., Margoliash, J., Alimo, S.R., Sadigh, D.: Efficient and safe exploration in deterministic Markov decision processes with unknown transition models. In: Proceedings of the 2019 American Control Conference, pp. 1792–1799 (2019)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)
Google Scholar
Brockman, G., et al.: OpenAI Gym. arXiv preprint. arXiv:1606.01540 (2016). The code is available at https://github.com/openai/gym with the MIT License
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp. 3387–3395 (2019)
Google Scholar
Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimization for continuous control. arXiv preprint. arXiv:1901.10031 (2019)
Fan, D.D., Nguyen, J., Thakker, R., Alatur, N., Agha-mohammadi, A.A., Theodorou, E.A.: Bayesian learning-based adaptive control for safety critical systems. In: Proceedings of the 2020 IEEE International Conference on Robotics and Automation, pp. 4093–4099 (2020)
Google Scholar
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
MathSciNet MATH Google Scholar
Ge, Y., Zhu, F., Ling, X., Liu, Q.: Safe Q-learning method based on constrained Markov decision processes. IEEE Access 7, 165007–165017 (2019)
Article Google Scholar
Glavic, M., Fonteneau, R., Ernst, D.: Reinforcement learning for electric power system decision and control: past considerations and perspectives. In: Proceedings of the 20th IFAC World Congress, pp. 6918–6927 (2017)
Google Scholar
Khojasteh, M.J., Dhiman, V., Franceschetti, M., Atanasov, N.: Probabilistic safety constraints for learned high relative degree system dynamics. In: Proceedings of the 2nd Conference on Learning for Dynamics and Control, pp. 781–792 (2020)
Google Scholar
Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Transactions on Intelligent Transportation Systems (2021). (Early Access)
Google Scholar
Koller, T., Berkenkamp, F., Turchetta, M., Boedecker, J., Krause, A.: Learning-based model predictive control for safe exploration and reinforcement learning. arXiv preprint. arXiv:1906.12189 (2019)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint. arXiv:1509.02971 (2015)
Liu, Z., Zhou, H., Chen, B., Zhong, S., Hebert, M., Zhao, D.: Safe model-based reinforcement learning with robust cross-entropy method. ICLR 2021 Workshop on Security and Safety in Machine Learning Systems (2021)
Google Scholar
Namerikawa, T., Matsumura, F., Fujita, M.: Robust trajectory following for an uncertain robot manipulator using H\(_{\infty }\) synthesis. In: Proceedings of the 3rd European Control Conference, pp. 3474–3479 (1995)
Google Scholar
Okawa, Y., Sasaki, T., Iwane, H.: Automatic exploration process adjustment for safe reinforcement learning with joint chance constraint satisfaction. In: Proceedings of the 21st IFAC World Congress, pp. 1588–1595 (2020)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press (2018)
Google Scholar
Thananjeyan, B., et al.: Recovery RL: safe reinforcement learning with learned recovery zones. IEEE Robot. Autom. Lett. 6(3), 4915–4922 (2021)
Article Google Scholar
Yang, T.Y., Rosca, J., Narasimhan, K., Ramadge, P.J.: Projection-based constrained policy optimization. In: Proceedings of the 8th International Conference on Learning Representations (2020)
Google Scholar

Download references

Acknowledgements

The authors thank Yusuke Kato for the fruitful discussions on theoretical results about the proposed method. The authors also thank anonymous reviewers for their valuable feedback. This work has been partially supported by Fujitsu Laboratories Ltd and JSPS KAKENHI Grant Number JP22H01513.

Author information

Authors and Affiliations

Artificial Intelligence Laboratory, Fujitsu Limited, Kawasaki, Japan
Yoshihiro Okawa, Tomotake Sasaki & Hitoshi Yanami
Department of System Design Engineering, Keio University, Yokohama, Japan
Toru Namerikawa

Authors

Yoshihiro Okawa
View author publications
You can also search for this author in PubMed Google Scholar
Tomotake Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Yanami
View author publications
You can also search for this author in PubMed Google Scholar
Toru Namerikawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoshihiro Okawa .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 760 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okawa, Y., Sasaki, T., Yanami, H., Namerikawa, T. (2023). Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_9
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance