Abstract
Markov decision processes (MDPs) are the standard formalism for modelling sequential decision making in stochastic environments. Policy synthesis addresses the problem of how to control or limit the decisions an agent makes so that a given specification is met. In this paper we consider PCTL*, the probabilistic counterpart of CTL*, as the specification language. Because in general the policy synthesis problem for PCTL* is undecidable, we restrict to policies whose execution history memory is finitely bounded a priori. Surprisingly, no algorithm for policy synthesis for this natural and expressive framework has been developed so far. We close this gap and describe a tableau-based algorithm that, given an MDP and a PCTL* specification, derives in a non-deterministic way a system of (possibly nonlinear) equalities and inequalities. The solutions of this system, if any, describe the desired (stochastic) policies. Our main result in this paper is the correctness of our method, i.e., soundness, completeness and termination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
By the semantics of the \(\varvec{\mathsf P}\)-operator, the sub-derivation has to start from \(\langle \mathrm {start}(s),s\rangle \), not \(\langle m,s\rangle \).
- 3.
The argument is standard for calculi based on formula expansion, as embodied in the \(\varvec{\mathsf U}\) and \(\lnot \varvec{\mathsf U}\) rules: the sets of formulas obtainable by these rules is a subset of an a priori determined finite set of formulas. This set consists of all subformulas of the given formula closed under negation and other operators. Any infinite branch hence would have to repeat one of these sets infinitely often, which is impossible with the loop rules. Moreover, the state set S and the mode set M are finite and so the other rules do not cause problems either.
- 4.
In terms of the resulting program, \((x_u)_{\langle m,s\rangle }^\psi \) is not constrained to any specific value in [0..1]. This can be shown by “substituting in” the equalities in \(\Gamma _\mathrm {final}\) for the probabilities of the pivots in the subtree below u and arithmetic simplifications.
References
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Baier, C., Größer, M., Leucker, M., Bollig, B., Ciesinski, F.: Controller synthesis for probabilistic systems. In: TCS 2004 (2004)
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
Baumgartner, P., Thiébaux, S., Trevizan, F.: Tableaux for policy synthesis for MDPS with PCTL* constraints. CoRR, abs/1706.10102 (2017)
Brázdil, T., Brozek, V., Forejt, V., Kucera, A.: Stochastic games with branching-time winning objectives. In: 21th IEEE Symposium on Logic in Computer Science LICS (2006)
Brázdil, T., Forejt, V.: Strategy synthesis for Markov decision processes and branching-time logics. In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 428–444. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74407-8_29
Brázdil, T., Forejt, V., Kučera, A.: Controller synthesis and verification for Markov decision processes with qualitative branching time objectives. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008. LNCS, vol. 5126, pp. 148–159. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70583-3_13
Brázdil, T., Kučera, A., Stražovský, O.: On the decidability of temporal properties of probabilistic pushdown automata. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 145–157. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31856-9_12
Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)
Ding, X.C., Pinto, A., Surana, A.: Strategic planning under uncertainties via constrained Markov decision processes. In: IEEE International Conference on Robotics and Automation ICRA (2013)
Ding, X.C., Smith, S., Belta, C., Rus, D.: Optimal control of Markov decision processes with linear temporal logic constraints. IEEE Trans. Automat. Contr. 59(5), 1244–1257 (2014)
Dolgov, D., Durfee, E.: Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors. In: IJCAI (2005)
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21455-4_3
Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains: With a Chapter of Markov Random Fields by David Griffeath, vol. 40. Springer, Heidelberg (2012)
Kučera, A., Stražovský, O.: On the controller synthesis for finite-state Markov decision processes. In: Sarukkai, S., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, pp. 541–552. Springer, Heidelberg (2005). doi:10.1007/11590156_44
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220–270. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72522-0_6
Kwiatkowska, M., Parker, D.: Automated verification and strategy synthesis for probabilistic systems. In: Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 5–22. Springer, Cham (2013). doi:10.1007/978-3-319-02444-8_2
Reynolds, M.: A new rule for LTL tableaux. In: GandALF (2016)
Reynolds, M.: A traditional tree-style tableau for LTL. CoRR, abs/1604.03962 (2016)
Sprauel, J., Kolobov, A., Teichteil-Königsbuch, F.: Saturated path-constrained MDP: planning under uncertainty and deterministic model-checking constraints. In: AAAI (2014)
Svorenová, M., Cerna, I., Belta, C.: Optimal control of MDPs with temporal logic constraints. In: CDC (2013)
Trevizan, F., Thiébaux, S., Santana, P., Williams, B.: Heuristic search in dual space for constrained stochastic shortest path problems. In: ICAPS (2016)
Acknowledgements
This research was funded by AFOSR grant FA2386-15-1-4015. We would also like to thank the anonymous reviewers for their constructive and helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Baumgartner, P., Thiébaux, S., Trevizan, F. (2017). Tableaux for Policy Synthesis for MDPs with PCTL* Constraints. In: Schmidt, R., Nalon, C. (eds) Automated Reasoning with Analytic Tableaux and Related Methods. TABLEAUX 2017. Lecture Notes in Computer Science(), vol 10501. Springer, Cham. https://doi.org/10.1007/978-3-319-66902-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-66902-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66901-4
Online ISBN: 978-3-319-66902-1
eBook Packages: Computer ScienceComputer Science (R0)