Skip to main content

Regular Decision Processes for Grid Worlds

  • Conference paper
  • First Online:
Artificial Intelligence and Machine Learning (BNAIC/Benelearn 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1530))

Included in the following conference series:

Abstract

Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the current paper there is no room to highlight it, but model-checkers such as Storm (https://www.stormchecker.org/) can be employed for shaping and shielding purposes (and more) in this tool chain, cf. [21]).

  2. 2.

    https://github.com/mdaines/viz.js.

References

  1. Abadi, E., Brafman, R.I.: Learning and solving regular decision processes. In: IJCAI (2020)

    Google Scholar 

  2. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI (2018)

    Google Scholar 

  3. Bacchus, F., Boutilier, C., Grove, A.: Rewarding behaviors. In: AAAI (1996)

    Google Scholar 

  4. Brafman, R., Giacomo, G.D., Patrizi, F.: LTLf/LDLf non-Markovian rewards (2018)

    Google Scholar 

  5. Brafman, R.I., De Giacomo, G.: Planning for LTLf/LDLf goals in non-Markovian fully observable nondeterministic domains. In: IJCAI (2019)

    Google Scholar 

  6. Brafman, R.I., De Giacomo, G.: Regular decision processes: a model for non-Markovian domains. In: IJCAI (2019)

    Google Scholar 

  7. Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: IJCAI (2019)

    Google Scholar 

  8. Camacho, A., McIlraith, S.A.: Learning interpretable models expressed in linear temporal logic. In: ICAPS (2019)

    Google Scholar 

  9. De Giacomo, G., Favorito, M., Iocchi, L., Patrizi, F., Ronca, A.: Temporal logic monitoring rewards via transducers. In: KR (2020)

    Google Scholar 

  10. De Giacomo, G., Iocchi, L., Favorito, M., Patrizi, F.: Reinforcement learning for LTLf/LDLf goals. arXiv preprint arXiv:1807.06333 (2018)

  11. De Giacomo, G., Iocchi, L., Favorito, M., Patrizi, F.: Foundations for restraining bolts: reinforcement learning with LTLf/LDLf restraining specifications. In: ICAPS (2019)

    Google Scholar 

  12. De Giacomo, G., Vardi, M.Y.: Linear temporal logic and linear dynamic logic on finite traces. In: AAAI (2013)

    Google Scholar 

  13. Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: AAAI (2018)

    Google Scholar 

  14. Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Induction and exploitation of subgoal automata for reinforcement learning. J. Artif. Intell. Res. 70, 1031–1116 (2021)

    Article  MathSciNet  Google Scholar 

  15. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  16. Giaquinta, R., Hoffmann, R., Ireland, M., Miller, A., Norman, G.: Strategy synthesis for autonomous agents using PRISM. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 220–236. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5_16

    Chapter  Google Scholar 

  17. Hassouni, A., Hoogendoorn, M., van Otterlo, M., Barbaro, E.: Personalization of health interventions using cluster-based reinforcement learning. In: Miller, T., Oren, N., Sakurai, Y., Noda, I., Savarimuthu, B.T.R., Cao Son, T. (eds.) PRIMA 2018. LNCS (LNAI), vol. 11224, pp. 467–475. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03098-8_31

    Chapter  Google Scholar 

  18. Jothimurugan, K., Alur, R., Bastani, O.: A composable specification language for reinforcement learning tasks. In: NeurIPS (2019)

    Google Scholar 

  19. Kasenberg, D., Thielstrom, R., Scheutz, M.: Generating explanations for temporal logic planner decisions. In: ICAPS (2020)

    Google Scholar 

  20. Kim, J., Muise, C., Shah, A., Agarwal, S., Shah, J.: Bayesian inference of linear temporal logic specifications for contrastive explanations. In: IJCAI (2019)

    Google Scholar 

  21. Lenaers, N.: An empirical study on regular decision processes for grid worlds. Master’s thesis, Department of Computer Science, Faculty of Science, Open University (2021)

    Google Scholar 

  22. Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: IROS (2017)

    Google Scholar 

  23. Liao, H.C.: A survey of reinforcement learning with temporal logic rewards (2020)

    Google Scholar 

  24. Liao, S.M.: Ethics of Artificial Intelligence. Oxford University Press, Oxford (2020)

    Book  Google Scholar 

  25. Littman, M.L., Topcu, U., Fu, J., Isbell, C., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341 (2017)

  26. Mirhoseini, A., et al.: A graph placement methodology for fast chip design. Nature 594(7862), 207–212 (2021)

    Article  Google Scholar 

  27. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML (1999)

    Google Scholar 

  28. van Otterlo, M.: Ethics and the value (s) of artificial intelligence. Nieuw Archief voor Wiskunde 5(19), 3 (2018)

    Google Scholar 

  29. Pnueli, A.: The temporal logic of programs. In: Proceedings of the 18th Annual Symposium on Foundations of Computer Science (1977)

    Google Scholar 

  30. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (1994)

    Book  Google Scholar 

  31. Romeo, Í.Í., Lohstroh, M., Iannopollo, A., Lee, E.A., Sangiovanni-Vincentelli, A.: A metric for linear temporal logic. arXiv preprint arXiv:1812.03923 (2018)

  32. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  33. Spaan, M.T.J.: Partially observable Markov decision processes. In: Wiering, M.A., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 387–414. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3_12

    Chapter  Google Scholar 

  34. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  35. Thiébaux, S., Gretton, C., Slaney, J., Price, D., Kabanza, F.: Decision-theoretic planning with non-Markovian rewards. JAIR 25, 17–74 (2006)

    Article  MathSciNet  Google Scholar 

  36. Van Otterlo, M.: The Logic of Adaptive Behavior, Frontiers in Artificial Intelligence and Applications, vol. 192. IOS Press, Amsterdam (2009)

    MATH  Google Scholar 

  37. Wang, H., Dong, S., Shao, L.: Measuring structural similarities in finite MDPs. In: IJCAI (2019)

    Google Scholar 

  38. Wang, H., et al.: Deep reinforcement learning: a survey. Front. Inf. Technol. Electron. Eng. 21, 1726–1744 (2020). https://doi.org/10.1631/FITEE.1900533

    Article  Google Scholar 

  39. Wiering, M.A., Van Otterlo, M.: Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27645-3

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martijn van Otterlo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lenaers, N., van Otterlo, M. (2022). Regular Decision Processes for Grid Worlds. In: Leiva, L.A., Pruski, C., Markovich, R., Najjar, A., Schommer, C. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2021. Communications in Computer and Information Science, vol 1530. Springer, Cham. https://doi.org/10.1007/978-3-030-93842-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93842-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93841-3

  • Online ISBN: 978-3-030-93842-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics