Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

Baumgartner, Peter; Thiébaux, Sylvie; Trevizan, Felipe

doi:10.1007/978-3-319-66902-1_11

Peter Baumgartner¹⁵,
Sylvie Thiébaux¹⁵ &
Felipe Trevizan¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10501))

Included in the following conference series:

International Conference on Automated Reasoning with Analytic Tableaux and Related Methods

545 Accesses
1 Citations
5 Altmetric

Abstract

Markov decision processes (MDPs) are the standard formalism for modelling sequential decision making in stochastic environments. Policy synthesis addresses the problem of how to control or limit the decisions an agent makes so that a given specification is met. In this paper we consider PCTL*, the probabilistic counterpart of CTL*, as the specification language. Because in general the policy synthesis problem for PCTL* is undecidable, we restrict to policies whose execution history memory is finitely bounded a priori. Surprisingly, no algorithm for policy synthesis for this natural and expressive framework has been developed so far. We close this gap and describe a tableau-based algorithm that, given an MDP and a PCTL* specification, derives in a non-deterministic way a system of (possibly nonlinear) equalities and inequalities. The solutions of this system, if any, describe the desired (stochastic) policies. Our main result in this paper is the correctness of our method, i.e., soundness, completeness and termination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://projects.coin-or.org/.
2.
By the semantics of the \(\varvec{\mathsf P}\)-operator, the sub-derivation has to start from \(\langle \mathrm {start}(s),s\rangle \), not \(\langle m,s\rangle \).
3.
The argument is standard for calculi based on formula expansion, as embodied in the \(\varvec{\mathsf U}\) and \(\lnot \varvec{\mathsf U}\) rules: the sets of formulas obtainable by these rules is a subset of an a priori determined finite set of formulas. This set consists of all subformulas of the given formula closed under negation and other operators. Any infinite branch hence would have to repeat one of these sets infinitely often, which is impossible with the loop rules. Moreover, the state set S and the mode set M are finite and so the other rules do not cause problems either.
4.
In terms of the resulting program, \((x_u)_{\langle m,s\rangle }^\psi \) is not constrained to any specific value in [0..1]. This can be shown by “substituting in” the equalities in \(\Gamma _\mathrm {final}\) for the probabilities of the pivots in the subtree below u and arithmetic simplifications.

References

Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
MATH Google Scholar
Baier, C., Größer, M., Leucker, M., Bollig, B., Ciesinski, F.: Controller synthesis for probabilistic systems. In: TCS 2004 (2004)
Google Scholar
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Baumgartner, P., Thiébaux, S., Trevizan, F.: Tableaux for policy synthesis for MDPS with PCTL* constraints. CoRR, abs/1706.10102 (2017)
Google Scholar
Brázdil, T., Brozek, V., Forejt, V., Kucera, A.: Stochastic games with branching-time winning objectives. In: 21th IEEE Symposium on Logic in Computer Science LICS (2006)
Google Scholar
Brázdil, T., Forejt, V.: Strategy synthesis for Markov decision processes and branching-time logics. In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 428–444. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74407-8_29
Chapter Google Scholar
Brázdil, T., Forejt, V., Kučera, A.: Controller synthesis and verification for Markov decision processes with qualitative branching time objectives. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008. LNCS, vol. 5126, pp. 148–159. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70583-3_13
Chapter Google Scholar
Brázdil, T., Kučera, A., Stražovský, O.: On the decidability of temporal properties of probabilistic pushdown automata. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 145–157. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31856-9_12
Chapter Google Scholar
Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)
Article MATH MathSciNet Google Scholar
Ding, X.C., Pinto, A., Surana, A.: Strategic planning under uncertainties via constrained Markov decision processes. In: IEEE International Conference on Robotics and Automation ICRA (2013)
Google Scholar
Ding, X.C., Smith, S., Belta, C., Rus, D.: Optimal control of Markov decision processes with linear temporal logic constraints. IEEE Trans. Automat. Contr. 59(5), 1244–1257 (2014)
Article MATH MathSciNet Google Scholar
Dolgov, D., Durfee, E.: Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors. In: IJCAI (2005)
Google Scholar
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21455-4_3
Chapter Google Scholar
Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains: With a Chapter of Markov Random Fields by David Griffeath, vol. 40. Springer, Heidelberg (2012)
Google Scholar
Kučera, A., Stražovský, O.: On the controller synthesis for finite-state Markov decision processes. In: Sarukkai, S., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, pp. 541–552. Springer, Heidelberg (2005). doi:10.1007/11590156_44
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220–270. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72522-0_6
Chapter Google Scholar
Kwiatkowska, M., Parker, D.: Automated verification and strategy synthesis for probabilistic systems. In: Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 5–22. Springer, Cham (2013). doi:10.1007/978-3-319-02444-8_2
Chapter Google Scholar
Reynolds, M.: A new rule for LTL tableaux. In: GandALF (2016)
Google Scholar
Reynolds, M.: A traditional tree-style tableau for LTL. CoRR, abs/1604.03962 (2016)
Google Scholar
Sprauel, J., Kolobov, A., Teichteil-Königsbuch, F.: Saturated path-constrained MDP: planning under uncertainty and deterministic model-checking constraints. In: AAAI (2014)
Google Scholar
Svorenová, M., Cerna, I., Belta, C.: Optimal control of MDPs with temporal logic constraints. In: CDC (2013)
Google Scholar
Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: Heuristic search in dual space for constrained stochastic shortest path problems. In: ICAPS (2016)
Google Scholar

Download references

Acknowledgements

This research was funded by AFOSR grant FA2386-15-1-4015. We would also like to thank the anonymous reviewers for their constructive and helpful comments.

Author information

Authors and Affiliations

Data61/CSIRO and Research School of Computer Science, ANU, Canberra, Australia
Peter Baumgartner, Sylvie Thiébaux & Felipe Trevizan

Authors

Peter Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
Sylvie Thiébaux
View author publications
You can also search for this author in PubMed Google Scholar
Felipe Trevizan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Baumgartner .

Editor information

Editors and Affiliations

University of Manchester, Manchester, United Kingdom
Renate A. Schmidt
University of Brasília, Brasília D.F., Brazil
Cláudia Nalon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baumgartner, P., Thiébaux, S., Trevizan, F. (2017). Tableaux for Policy Synthesis for MDPs with PCTL* Constraints. In: Schmidt, R., Nalon, C. (eds) Automated Reasoning with Analytic Tableaux and Related Methods. TABLEAUX 2017. Lecture Notes in Computer Science(), vol 10501. Springer, Cham. https://doi.org/10.1007/978-3-319-66902-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-66902-1_11
Published: 30 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66901-4
Online ISBN: 978-3-319-66902-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics