Advertisement

Approximating the Termination Value of One-Counter MDPs and Stochastic Games

  • Tomáš Brázdil
  • Václav Brožek
  • Kousha Etessami
  • Antonín Kučera
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6756)

Abstract

One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value.

Recently [4,2], we studied qualitative decision problems (“is the optimal termination value = 1?”) for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP∩coNP, respectively). However, quantitative decision and approximation problems (“is the optimal termination value ≥ p”, or “approximate the termination value within ε”) are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable.

In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given ε > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error ε, and furthermore we can compute ε-optimal strategies for both players in the game.

A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma’s inequality on these martingales yields a computable bound for the “wealth” at which a “rich person’s strategy” becomes ε-optimal for OC-MDPs.

Keywords

Optimal Strategy Markov Decision Process Termination Objective Transition Graph Stochastic Game 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berger, N., Kapur, N., Schulman, L.J., Vazirani, V.: Solvency Games. In: Proc. of FSTTCS 2008 (2008)Google Scholar
  2. 2.
    Brázdil, T., Brožek, V., Etessami, K.: One-Counter Simple Stochastic Games. In: Proc. of FSTTCS 2010, pp. 108–119 (2010)Google Scholar
  3. 3.
    Brázdil, T., Brožek, V., Etessami, K., Kučera, A.: Approximating the Termination Value of One-Counter MDPs and Stochastic Games. Tech. Rep. abs/1104.4978, CoRR (2011), http://arxiv.org/abs/1104.4978
  4. 4.
    Brázdil, T., Brožek, V., Etessami, K., Kučera, A., Wojtczak, D.: One-Counter Markov Decision Processes. In: ACM-SIAM SODA. pp. 863–874 (2010), full tech. report: CoRR, abs/0904.2511 (2009), http://arxiv.org/abs/0904.2511
  5. 5.
    Brázdil, T., Brožek, V., Forejt, V., Kučera, A.: Reachability in recursive Markov decision processes. In: Baier, C., Hermanns, H. (eds.) CONCUR 2006. LNCS, vol. 4137, pp. 358–374. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Brázdil, T., Brožek, V., Kučera, A., Obdržálek, J.: Qualitative Reachability in stochastic BPA games. In: Proc. 26th STACS, pp. 207–218 (2009)Google Scholar
  7. 7.
    Brázdil, T., Kiefer, S., Kučera, A.: Efficient analysis of probabilistic programs with an unbounded counter. CoRR abs/1102.2529 (2011)Google Scholar
  8. 8.
    Etessami, K., Wojtczak, D., Yannakakis, M.: Quasi-birth-death processes, tree-like QBDs, probabilistic 1-counter automata, and pushdown systems. In: Proc. 5th Int. Symp. on Quantitative Evaluation of Systems (QEST), pp. 243–253 (2008)Google Scholar
  9. 9.
    Etessami, K., Yannakakis, M.: Recursive Markov decision processes and recursive stochastic games. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 891–903. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Etessami, K., Yannakakis, M.: Efficient qualitative analysis of classes of recursive Markov decision processes and simple stochastic games. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Grimmett, G.R., Stirzaker, D.R.: Probability and Random Processes, 2nd edn. Oxford U. Press, Oxford (1992)zbMATHGoogle Scholar
  12. 12.
    Lambert, J., Van Houdt, B., Blondia, C.: A policy iteration algorithm for markov decision processes skip-free in one direction. In: ValueTools. ICST, Brussels (2007)Google Scholar
  13. 13.
    Puterman, M.L.: Markov Decision Processes. J. Wiley and Sons, Chichester (1994)CrossRefzbMATHGoogle Scholar
  14. 14.
    White, L.B.: A new policy iteration algorithm for Markov decision processes with quasi birth-death structure. Stochastic Models 21, 785–797 (2005)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tomáš Brázdil
    • 1
  • Václav Brožek
    • 2
  • Kousha Etessami
    • 2
  • Antonín Kučera
    • 1
  1. 1.Faculty of InformaticsMasaryk UniversityCzech Republic
  2. 2.School of InformaticsUniversity of EdinburghUK

Personalised recommendations