Skip to main content

Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

  • Conference paper
  • First Online:
Automated Reasoning with Analytic Tableaux and Related Methods (TABLEAUX 2017)

Abstract

Markov decision processes (MDPs) are the standard formalism for modelling sequential decision making in stochastic environments. Policy synthesis addresses the problem of how to control or limit the decisions an agent makes so that a given specification is met. In this paper we consider PCTL*, the probabilistic counterpart of CTL*, as the specification language. Because in general the policy synthesis problem for PCTL* is undecidable, we restrict to policies whose execution history memory is finitely bounded a priori. Surprisingly, no algorithm for policy synthesis for this natural and expressive framework has been developed so far. We close this gap and describe a tableau-based algorithm that, given an MDP and a PCTL* specification, derives in a non-deterministic way a system of (possibly nonlinear) equalities and inequalities. The solutions of this system, if any, describe the desired (stochastic) policies. Our main result in this paper is the correctness of our method, i.e., soundness, completeness and termination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://projects.coin-or.org/.

  2. 2.

    By the semantics of the \(\varvec{\mathsf P}\)-operator, the sub-derivation has to start from \(\langle \mathrm {start}(s),s\rangle \), not \(\langle m,s\rangle \).

  3. 3.

    The argument is standard for calculi based on formula expansion, as embodied in the \(\varvec{\mathsf U}\) and \(\lnot \varvec{\mathsf U}\) rules: the sets of formulas obtainable by these rules is a subset of an a priori determined finite set of formulas. This set consists of all subformulas of the given formula closed under negation and other operators. Any infinite branch hence would have to repeat one of these sets infinitely often, which is impossible with the loop rules. Moreover, the state set S and the mode set M are finite and so the other rules do not cause problems either.

  4. 4.

    In terms of the resulting program, \((x_u)_{\langle m,s\rangle }^\psi \) is not constrained to any specific value in [0..1]. This can be shown by “substituting in” the equalities in \(\Gamma _\mathrm {final}\) for the probabilities of the pivots in the subtree below u and arithmetic simplifications.

References

  1. Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)

    MATH  Google Scholar 

  2. Baier, C., Größer, M., Leucker, M., Bollig, B., Ciesinski, F.: Controller synthesis for probabilistic systems. In: TCS 2004 (2004)

    Google Scholar 

  3. Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  4. Baumgartner, P., Thiébaux, S., Trevizan, F.: Tableaux for policy synthesis for MDPS with PCTL* constraints. CoRR, abs/1706.10102 (2017)

    Google Scholar 

  5. Brázdil, T., Brozek, V., Forejt, V., Kucera, A.: Stochastic games with branching-time winning objectives. In: 21th IEEE Symposium on Logic in Computer Science LICS (2006)

    Google Scholar 

  6. Brázdil, T., Forejt, V.: Strategy synthesis for Markov decision processes and branching-time logics. In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 428–444. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74407-8_29

    Chapter  Google Scholar 

  7. Brázdil, T., Forejt, V., Kučera, A.: Controller synthesis and verification for Markov decision processes with qualitative branching time objectives. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008. LNCS, vol. 5126, pp. 148–159. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70583-3_13

    Chapter  Google Scholar 

  8. Brázdil, T., Kučera, A., Stražovský, O.: On the decidability of temporal properties of probabilistic pushdown automata. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 145–157. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31856-9_12

    Chapter  Google Scholar 

  9. Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  10. Ding, X.C., Pinto, A., Surana, A.: Strategic planning under uncertainties via constrained Markov decision processes. In: IEEE International Conference on Robotics and Automation ICRA (2013)

    Google Scholar 

  11. Ding, X.C., Smith, S., Belta, C., Rus, D.: Optimal control of Markov decision processes with linear temporal logic constraints. IEEE Trans. Automat. Contr. 59(5), 1244–1257 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  12. Dolgov, D., Durfee, E.: Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors. In: IJCAI (2005)

    Google Scholar 

  13. Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21455-4_3

    Chapter  Google Scholar 

  14. Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains: With a Chapter of Markov Random Fields by David Griffeath, vol. 40. Springer, Heidelberg (2012)

    Google Scholar 

  15. Kučera, A., Stražovský, O.: On the controller synthesis for finite-state Markov decision processes. In: Sarukkai, S., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, pp. 541–552. Springer, Heidelberg (2005). doi:10.1007/11590156_44

    Google Scholar 

  16. Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220–270. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72522-0_6

    Chapter  Google Scholar 

  17. Kwiatkowska, M., Parker, D.: Automated verification and strategy synthesis for probabilistic systems. In: Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 5–22. Springer, Cham (2013). doi:10.1007/978-3-319-02444-8_2

    Chapter  Google Scholar 

  18. Reynolds, M.: A new rule for LTL tableaux. In: GandALF (2016)

    Google Scholar 

  19. Reynolds, M.: A traditional tree-style tableau for LTL. CoRR, abs/1604.03962 (2016)

    Google Scholar 

  20. Sprauel, J., Kolobov, A., Teichteil-Königsbuch, F.: Saturated path-constrained MDP: planning under uncertainty and deterministic model-checking constraints. In: AAAI (2014)

    Google Scholar 

  21. Svorenová, M., Cerna, I., Belta, C.: Optimal control of MDPs with temporal logic constraints. In: CDC (2013)

    Google Scholar 

  22. Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: Heuristic search in dual space for constrained stochastic shortest path problems. In: ICAPS (2016)

    Google Scholar 

Download references

Acknowledgements

This research was funded by AFOSR grant FA2386-15-1-4015. We would also like to thank the anonymous reviewers for their constructive and helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Baumgartner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Baumgartner, P., Thiébaux, S., Trevizan, F. (2017). Tableaux for Policy Synthesis for MDPs with PCTL* Constraints. In: Schmidt, R., Nalon, C. (eds) Automated Reasoning with Analytic Tableaux and Related Methods. TABLEAUX 2017. Lecture Notes in Computer Science(), vol 10501. Springer, Cham. https://doi.org/10.1007/978-3-319-66902-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66902-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66901-4

  • Online ISBN: 978-3-319-66902-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics