Skip to main content

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13505)


It is challenging to use reinforcement learning (RL) in cyber-physical systems due to the lack of safety guarantees during learning. Although there have been various proposals to reduce undesired behaviors during learning, most of these techniques require prior system knowledge, and their applicability is limited. This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge. We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning. The dynamic shielding technique constructs an approximate system model in parallel with RL using a variant of the RPNI algorithm and suppresses undesired explorations due to the shield constructed from the learned model. Through this combination, potentially unsafe actions can be foreseen before the agent experiences them. Experiments show that our dynamic shield significantly decreases the number of undesired events during training.


  • Reinforcement learning
  • Shielding
  • Automata learning

S. Pruekprasert and T. Takisaka—The work was done during the employment of S.P. and T.T. at NII, Tokyo.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    The shield we use in this paper is the variant called preemptive shield in [1]. It is straightforward to apply our framework to the classic shield called post-posed shield.

  2. 2.

    The artifact is publicly available at


  1. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the AAAI 2018, pp. 2669–2678. AAAI Press (2018)

    Google Scholar 

  2. Avni, G., Bloem, R., Chatterjee, K., Henzinger, T.A., Könighofer, B., Pranger, S.: Run-time optimization for learned controllers through quantitative games. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 630–649. Springer, Cham (2019).

    CrossRef  Google Scholar 

  3. Bharadwaj, S., Bloem, R., Dimitrova, R., Könighofer, B., Topcu, U.: Synthesis of minimum-cost shields for multi-agent systems. In: Proceedings of the ACC 2019, pp. 1048–1055. IEEE (2019)

    Google Scholar 

  4. Bloem, R., Jensen, P.G., Könighofer, B., Larsen, K.G., Lorber, F., Palmisano, A.: It’s time to play safe: shield synthesis for timed systems. CoRR abs/2006.16688 (2020)

    Google Scholar 

  5. Bloem, R., Könighofer, B., Könighofer, R., Wang, C.: Shield synthesis: runtime enforcement for reactive systems. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 533–548. Springer, Heidelberg (2015).

    CrossRef  MATH  Google Scholar 

  6. Bouton, M., Karlsson, J., Nakhaei, A., Fujimura, K., Kochenderfer, M.J., Tumova, J.: Reinforcement learning with probabilistic guarantees for autonomous driving. CoRR abs/1904.07189 (2019)

    Google Scholar 

  7. Brockman, G., et al.: OpenAI Gym. CoRR abs/1606.01540 (2016)

    Google Scholar 

  8. Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI 2019, pp. 3387–3395. AAAI Press (2019)

    Google Scholar 

  9. Chevalier-Boisvert, M.: Gym-MiniWorld Environment for OpenAI Gym (2018).

  10. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  11. Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: Seghrouchni, A.E.F., Sukthankar, G., An, B., Yorke-Smith, N. (eds.) Proceedings of the AAMAS 2020, pp. 483–491. IFAAMS (2020)

    Google Scholar 

  12. Hunt, N., Fulton, N., Magliacane, S., Hoang, T.N., Das, S., Solar-Lezama, A.: Verifiably safe exploration for end-to-end reinforcement learning. In: Bogomolov, S., Jungers, R.M. (eds.) Proceedings of the HSCC 2021, pp. 14:1–14:11. ACM (2021)

    Google Scholar 

  13. Isberner, M., Howar, F., Steffen, B.: The open-source LearnLib - a framework for active automata learning. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 487–495. Springer, Cham (2015).

    CrossRef  Google Scholar 

  14. Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: Konnov, I., Kovács, L. (eds.) Proceedings of the CONCUR 2020. LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

    Google Scholar 

  15. Kupferman, O., Lampert, R.: On the construction of fine automata for safety properties. In: Graf, S., Zhang, W. (eds.) ATVA 2006. LNCS, vol. 4218, pp. 110–124. Springer, Heidelberg (2006).

    CrossRef  MATH  Google Scholar 

  16. Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In: Honavar, V., Slutzki, G. (eds.) ICGI 1998. LNCS, vol. 1433, pp. 1–12. Springer, Heidelberg (1998).

    CrossRef  Google Scholar 

  17. López, D., García, P.: On the inference of finite state automata from positive and negative data. In: Heinz, J., Sempere, J.M. (eds.) Topics in Grammatical Inference, pp. 73–112. Springer, Heidelberg (2016).

    CrossRef  Google Scholar 

  18. Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning Markov decision processes for model checking. In: Fahrenberg, U., Legay, A., Thrane, C.R. (eds.) Proceedings of the QFM 2012. EPTCS, vol. 103, pp. 49–63 (2012)

    Google Scholar 

  19. Mnih, V., et al.: Playing Atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)

    Google Scholar 

  20. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    CrossRef  Google Scholar 

  21. Oncina, J., García, P.: Identifying regular languages in polynomial time. Series in Machine Perception and Artificial Intelligence, pp. 99–108 (1993)

    Google Scholar 

  22. Plappert, M.: Keras-RL (2016).

  23. Pranger, S., Könighofer, B., Tappler, M., Deixelberger, M., Jansen, N., Bloem, R.: Adaptive shielding under uncertainty. In: Proceedings of the ACC 2021, pp. 3467–3474. IEEE (2021)

    Google Scholar 

  24. Raffin, A., Hill, A., Ernestus, M., Gleave, A., Kanervisto, A., Dormann, N.: Stable baselines3 (2019).

  25. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)

    Google Scholar 

  26. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    CrossRef  Google Scholar 

  27. Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning. MIT Press (1998)

    Google Scholar 

  28. Wu, M., Wang, J., Deshmukh, J., Wang, C.: Shield synthesis for real: Enforcing safety in cyber-physical systems. In: Barrett, C.W., Yang, J. (eds.) Proceedings of the FMCAD 2019, pp. 129–137. IEEE (2019)

    Google Scholar 

Download references


This work is partially supported by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603). Masaki Waga is also supported by JST ACT-X Grant No. JPMJAX200U. Stefan Klikovits is also supported by JSPS Grant-in-Aid No. 20K23334. Sasinee Pruekprasert is also supported by JSPS Grant-in-Aid No. 21K14191. Toru Takisaka is also supported by NSFC Research Fund for International Young Scientists No. 62150410437.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Masaki Waga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Waga, M., Castellano, E., Pruekprasert, S., Klikovits, S., Takisaka, T., Hasuo, I. (2022). Dynamic Shielding for Reinforcement Learning in Black-Box Environments. In: Bouajjani, A., Holík, L., Wu, Z. (eds) Automated Technology for Verification and Analysis. ATVA 2022. Lecture Notes in Computer Science, vol 13505. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19991-2

  • Online ISBN: 978-3-031-19992-9

  • eBook Packages: Computer ScienceComputer Science (R0)