Skip to main content

COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

  • Conference paper
  • First Online:
Dependable Software Engineering. Theories, Tools, and Applications (SETTA 2022)

Abstract

This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments.

Download it from https://lava-lab.github.io/COOL-MC/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    MLflow is a platform to streamline ML development, including tracking experiments, packaging code into reproducible experiments, and sharing and deploying models [41].

  2. 2.

    We refer the interested reader to the repository https://github.com/LAVA-LAB/COOL-MC of the tool for more experiments with these and other environments.

References

  1. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI, pp. 2669–2678. AAAI Press (2018)

    Google Scholar 

  2. Bacci, E., Parker, D.: Verified probabilistic policies for deep reinforcement learning. CoRR abs/2201.03698 (2022)

    Google Scholar 

  3. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  4. Boron, J., Darken, C.: Developing combat behavior through reinforcement learning in wargames and simulations. In: CoG, pp. 728–731. IEEE (2020)

    Google Scholar 

  5. Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8

    Chapter  Google Scholar 

  6. Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016)

    Google Scholar 

  7. Cassez, F., David, A., Fleury, E., Larsen, K.G., Lime, D.: Efficient on-the-fly algorithms for the analysis of timed games. In: Abadi, M., de Alfaro, L. (eds.) CONCUR 2005. LNCS, vol. 3653, pp. 66–80. Springer, Heidelberg (2005). https://doi.org/10.1007/11539452_9

    Chapter  Google Scholar 

  8. Clarke, E.M., Henzinger, T.A., Veith, H., Bloem, R. (eds.): Handbook of Model Checking. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8

    Book  MATH  Google Scholar 

  9. David, A., et al.: On time with minimal expected cost! In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 129–145. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_10

    Chapter  Google Scholar 

  10. David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16

    Chapter  Google Scholar 

  11. Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A Storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31

    Chapter  Google Scholar 

  12. Dräger, K., Forejt, V., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. Log. Methods Comput. Sci. 11(2) (2015)

    Google Scholar 

  13. Farazi, N.P., Zou, B., Ahamed, T., Barua, L.: Deep reinforcement learning in transportation research: a review. Transp. Res. Interdisc. Perspect. 11, 100425 (2021)

    Google Scholar 

  14. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  15. Gros, T.P., Hermanns, H., Hoffmann, J., Klauck, M., Köhl, M.A., Wolf, V.: MoGym: using formal models for training and verifying decision-making agents. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13372, pp. 430–443. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13188-2_21

    Chapter  Google Scholar 

  16. Gross, D., Jansen, N., Junges, S., Perez, G.A.: COOL-MC: a comprehensive tool for reinforcement learning and model checking. arXiv preprint arXiv:2209.07133 (2022)

  17. Gu, R., Jensen, P.G., Poulsen, D.B., Seceleanu, C., Enoiu, E., Lundqvist, K.: Verifiable strategy synthesis for multiple autonomous agents: a scalable approach. Int. J. Softw. Tools Technol. Transfer 24, 395–414 (2022). https://doi.org/10.1007/s10009-022-00657-z

    Article  Google Scholar 

  18. Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 395–412. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_27

    Chapter  Google Scholar 

  19. Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Mungojerrie: reinforcement learning of linear-time objectives. CoRR abs/2106.09161 (2021)

    Google Scholar 

  20. Hartmanns, A., Hermanns, H.: The modest toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51

    Chapter  Google Scholar 

  21. Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 344–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_20

    Chapter  Google Scholar 

  22. Hasanbeig, M., Kroening, D., Abate, A.: Towards verifiable and safe model-free reinforcement learning. In: OVERLAY@AI*IA. CEUR WS, vol. 2509, p. 1. CEUR-WS.org (2019)

    Google Scholar 

  23. Hasanbeig, M., Kroening, D., Abate, A.: Deep reinforcement learning with temporal logics. In: Bertrand, N., Jansen, N. (eds.) FORMATS 2020. LNCS, vol. 12288, pp. 1–22. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57628-8_1

    Chapter  Google Scholar 

  24. Jaeger, M., Jensen, P.G., Guldstrand Larsen, K., Legay, A., Sedwards, S., Taankvist, J.H.: Teaching stratego to play ball: optimal synthesis for continuous space MDPs. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 81–97. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_5

    Chapter  Google Scholar 

  25. Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: CONCUR. LIPIcs, vol. 171, pp. 3:1–3:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)

    Google Scholar 

  26. Jin, P., Tian, J., Zhi, D., Wen, X., Zhang, M.: Trainify: a CEGAR-driven training and verification framework for safe deep reinforcement learning. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13371, pp. 193–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13185-1_10

    Chapter  Google Scholar 

  27. Jothimurugan, K., Bansal, S., Bastani, O., Alur, R.: Specification-guided learning of nash equilibria with high social welfare. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13372, pp. 343–363. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13188-2_17

    Chapter  Google Scholar 

  28. Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.-P.: Safety-constrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_8

    Chapter  Google Scholar 

  29. Junges, S., Jansen, N., Seshia, S.A.: Enforcing almost-sure reachability in POMDPs. In: Silva, A., Leino, K.R.M. (eds.) CAV 2021. LNCS, vol. 12760, pp. 602–625. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81688-9_28

    Chapter  Google Scholar 

  30. Kimura, H., Yamamura, M., Kobayashi, S.: Reinforcement learning by stochastic hill climbing on discounted reward. In: ICML, pp. 295–303. Morgan Kaufmann (1995)

    Google Scholar 

  31. Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 2.0: a tool for probabilistic model checking. In: QEST, pp. 322–323. IEEE Computer Society (2004)

    Google Scholar 

  32. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 39:1–39:40 (2016)

    Google Scholar 

  33. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  34. Nakabi, T.A., Toivanen, P.: Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain. Energy Grids Netw. 25, 100413 (2021)

    Article  Google Scholar 

  35. Strehl, A.L., Diuk, C., Littman, M.L.: Efficient structure learning in factored-state MDPs. In: AAAI, pp. 645–650. AAAI Press (2007)

    Google Scholar 

  36. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  37. Vamplew, P., et al.: Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021). Auton. Agents Multi Agent Syst. 36(2) (2022). Article number: 41. https://doi.org/10.1007/s10458-022-09575-5

  38. Wang, Y., Roohi, N., West, M., Viswanathan, M., Dullerud, G.E.: Statistically model checking PCTL specifications on Markov decision processes via reinforcement learning. In: CDC, pp. 1392–1397. IEEE (2020)

    Google Scholar 

  39. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992). https://doi.org/10.1007/BF00992698

    Article  MATH  Google Scholar 

  40. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992). https://doi.org/10.1007/BF00992696

    Article  MATH  Google Scholar 

  41. Zaharia, M., et al.: Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull. 41(4), 39–45 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dennis Gross .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gross, D., Jansen, N., Junges, S., Pérez, G.A. (2022). COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking. In: Dong, W., Talpin, JP. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2022. Lecture Notes in Computer Science, vol 13649. Springer, Cham. https://doi.org/10.1007/978-3-031-21213-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21213-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21212-3

  • Online ISBN: 978-3-031-21213-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics