Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Hahn, Ernst Moritz; Perez, Mateo; Schewe, Sven; Somenzi, Fabio; Trivedi, Ashutosh; Wojtczak, Dominik

doi:10.1007/978-3-030-45190-5_17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12078))

Included in the following conference series:

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

7401 Accesses
13 Citations

Abstract

We characterize the class of nondeterministic \(\omega \)-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata ‘good-for-MDPs’ (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties—they are Büchi automata with low branching degree obtained through a simple construction—and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.

This work has been supported by the National Natural Science Foundation of China (Grant Nr. 61532019), EPSRC grants EP/M027287/1 and EP/P020909/1, and a CU Boulder RIO grant.

Download to read the full chapter text

Chapter PDF

Omega-Regular Objectives in Model-Free Reinforcement Learning

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Solving Markov Decision Processes via Simulation

References

T. Babiak, M. Křetínský, V. Rehák, and J. Strejcek. LTL to Büchi automata translation: Fast and more deterministic. In Tools and Algorithms for the Construction and Analysis of Systems, pages 95–109, 2012.
Google Scholar
Ch. Baier and J.-P. Katoen. Principles of Model Checking. MIT Press, 2008.
Google Scholar
C. Courcoubetis and M. Yannakakis. Verifying temporal properties of finite-state probabilistic programs. In Foundations of Computer Science, pages 338–345. IEEE, 1988.
Google Scholar
C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verification. J. ACM, 42(4):857–907, July 1995.
Google Scholar
L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford University, 1998.
Google Scholar
P. Dhariwal, Ch. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017.
D. L. Dill, A. J. Hu, and H. Wong-Toi. Checking for language inclusion using simulation relations. In Computer Aided Verification, pages 255–265, July 1991. LNCS 575.
Google Scholar
A. Duret-Lutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu. Spot 2.0 - A framework for LTL and \(\omega \)-automata manipulation. In Automated Technology for Verification and Analysis, pages 122–129, 2016.
Google Scholar
K. Etessami, T. Wilke, and R. A. Schuller. Fair simulation relations, parity games, and state space reduction for Büchi automata. SIAM J. Comput., 34(5):1159–1175, 2005.
Google Scholar
S. Gurumurthy, R. Bloem, and F. Somenzi. Fair simulation minimization. In Computer Aided Verification (CAV’02), pages 610–623, July 2002. LNCS 2404.
Google Scholar
E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. Lazy probabilistic model checking without determinisation. In Concurrency Theory, pages 354–367, 2015.
Google Scholar
E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Tools and Algorithms for the Construction and Analysis of Systems, pages 395–412, 2019. LNCS 11427.
Google Scholar
E. M. Hahn, M. Perez, F. Somenzi, A. Trivedi, S. Schewe, and D. Wojtczak. Good-for-MDPs automata. arXiv e-prints, abs/1909.05081, September 2019.
Google Scholar
T. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. In Concurrency Theory, pages 273–287, 1997. LNCS 1243.
Google Scholar
T. A. Henzinger and N. Piterman. Solving games without determinization. In Computer Science Logic, pages 394–409, September 2006. LNCS 4207.
Google Scholar
D. Kini and M. Viswanathan. Optimal translation of LTL to limit deterministic automata. In Tools and Algorithms for the Construction and Analysis of Systems, pages 113–129, 2017.
Google Scholar
J. Klein, D. Müller, Ch. Baier, and S. Klüppelholz. Are good-for-games automata good for probabilistic model checking? In Language and Automata Theory and Applications, pages 453–465. Springer, 2014.
Google Scholar
J. Klein, D. Müller, Ch. Baier, and S. Klüppelholz. Are good-for-games automata good for probabilistic model checking? In Language and Automata Theory and Applications, pages 453–465, 2014.
Google Scholar
J. Křetínský, T. Meggendorfer, S. Sickert, and Ch. Ziegler. Rabinizer 4: from LTL to your favourite deterministic automaton. In Computer Aided Verification, pages 567–577. Springer, 2018.
Google Scholar
J. Křetínský, T. Meggendorfer, and S. Sickert. Owl: A library for \(\omega \)-words, automata, and LTL. In Automated Technology for Verification and Analysis, pages 543–550, 2018.
Google Scholar
R. Milner. An algebraic definition of simulation between programs. Int. Joint Conf. on Artificial Intelligence, pages 481–489, 1971.
Google Scholar
N. Piterman. From deterministic Büchi and Streett automata to deterministic parity automata. Logical Methods in Computer Science, 3(3):1–21, 2007.
Google Scholar
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, NY, USA, 1994.
Google Scholar
S. Safra. Complexity of Automata on Infinite Objects. PhD thesis, The Weizmann Institute of Science, March 1989.
Google Scholar
S. Schewe. Beyond hyper-minimisation—minimising DBAs and DPAs isNP-complete. In Foundations of Software Technology and Theoretical Computer Science, FSTTCS, pages 400–411, 2010.
Google Scholar
S. Schewe and T. Varghese. Tight bounds for the determinisation and complementation of generalised Büchi automata. In Automated Technology for Verification and Analysis, pages 42–56, 2012.
Google Scholar
S. Schewe and T. Varghese. Determinising parity automata. In Mathematical Foundations of Computer Science, pages 486–498, 2014.
Google Scholar
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
Google Scholar
S. Sickert, J. Esparza, S. Jaax, and J. Křetínský. Limit-deterministic Büchi automata for linear temporal logic. In Computer Aided Verification, pages 312–332, 2016. LNCS 9780.
Google Scholar
S. Sickert and J. Křetínský. MoChiBA: Probabilistic LTL model checking using limit-deterministic Büchi automata. In Automated Technology for Verification and Analysis, pages 130–137, 2016.
Google Scholar
F. Somenzi and R. Bloem. Efficient Büchi automata from LTL formulae. In Computer Aided Verification, pages 248–263, July 2000. LNCS 1855.
Google Scholar
M.-H. Tsai, S. Fogarty, M. Y. Vardi, and Y.-K. Tsay. State of Büchi complementation. Logical Mehods in Computer Science, 10(4), 2014.
Google Scholar
M.-H. Tsai, Y.-K. Tsay, and Y.-S. Hwang. GOAL for games, omega-automata, and logics. In Computer Aided Verification, pages 883–889, 2013.
Google Scholar
M. Y. Vardi. Automatic verification of probabilistic concurrent finite state programs. In Foundations of Computer Science, pages 327–338, 1985.
Google Scholar
E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning Figshare (2020), https://doi.org/10.6084/m9.figshare.11882739.
A. Hartmanns and M. Seidl. tacas20ae.ova. Figshare (2019) https://doi.org/10.6084/m9.figshare.9699839.v2

Download references

Author information

Authors and Affiliations

School of EEECS, Queen’s University Belfast, Belfast, UK
Ernst Moritz Hahn
State Key Laboratory of Computer Science, Institute of Software, CAS, Beijing, People’s Republic of China
Ernst Moritz Hahn
University of Colorado Boulder, Boulder, USA
Mateo Perez, Fabio Somenzi & Ashutosh Trivedi
University of Liverpool, Liverpool, UK
Sven Schewe & Dominik Wojtczak

Authors

Ernst Moritz Hahn
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Perez
View author publications
You can also search for this author in PubMed Google Scholar
Sven Schewe
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Somenzi
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Trivedi
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Wojtczak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ernst Moritz Hahn .

Editor information

Editors and Affiliations

Johannes Kepler University, Linz, Austria
Armin Biere
University of Birmingham, Birmingham, UK
David Parker

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D. (2020). Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning. In: Biere, A., Parker, D. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2020. Lecture Notes in Computer Science(), vol 12078. Springer, Cham. https://doi.org/10.1007/978-3-030-45190-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-45190-5_17
Published: 17 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45189-9
Online ISBN: 978-3-030-45190-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Omega-Regular Objectives in Model-Free Reinforcement Learning

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Solving Markov Decision Processes via Simulation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Omega-Regular Objectives in Model-Free Reinforcement Learning

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Solving Markov Decision Processes via Simulation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation