Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations

  • Kousha Etessami
  • Alistair Stewart
  • Mihalis Yannakakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7391)

Abstract

We show that one can approximate the least fixed point solution for a multivariate system of monotone probabilistic max (min) polynomial equations, in time polynomial in both the encoding size of the system of equations and in log(1/ε), where ε > 0 is the desired additive error bound of the solution. (The model of computation is the standard Turing machine model.)

These equations form the Bellman optimality equations for several important classes of infinite-state Markov Decision Processes (MDPs). Thus, as a corollary, we obtain the first polynomial time algorithms for computing to within arbitrary desired precision the optimal value vector for several classes of infinite-state MDPs which arise as extensions of classic, and heavily studied, purely stochastic processes. These include both the problem of maximizing and minimizing the termination (extinction) probability of multi-type branching MDPs, stochastic context-free MDPs, and 1-exit Recursive MDPs. We also show that we can compute in P-time an ε-optimal policy for any given desired ε > 0.

Keywords

Optimal Policy Markov Decision Process Stochastic Game Bellman Equation Full Version 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allender, E., Bürgisser, P., Kjeldgaard-Pedersen, J., Miltersen, P.B.: On the complexity of numerical analysis. SIAM J. Comput. 38(5), 1987–2006 (2009)MATHCrossRefGoogle Scholar
  2. 2.
    Brázdil, T., Brozek, V., Forejt, V., Kucera, A.: Reachability in recursive markov decision processes. Inf. Comput. 206(5), 520–537 (2008)MATHCrossRefGoogle Scholar
  3. 3.
    Courcoubetis, C., Yannakakis, M.: Markov decision processes and regular events. IEEE Trans. on Automatic Control 43(10), 1399–1418 (1998)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Denardo, E., Rothblum, U.: Totally expanding multiplicative systems. Linear Algebra Appl. 406, 142–158 (2005)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Esparza, J., Gawlitza, T., Kiefer, S., Seidl, H.: Approximative Methods for Monotone Systems of Min-Max-Polynomial Equations. In: Aceto, L., et al. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 698–710. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Esparza, J., Kiefer, S., Luttenberger, M.: Computing the least fixed point of positive polynomial systems. SIAM J. on Computing 39(6), 2282–2355 (2010)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Esparza, J., Kučera, A., Mayr, R.: Model checking probabilistic pushdown automata. Logical Methods in Computer Science 2(1), 1–31 (2006)Google Scholar
  8. 8.
    Etessami, K., Stewart, A., Yannakakis, M.: Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations. Preprint of the full version of this paper on ArXiv: 1202.4798Google Scholar
  9. 9.
    Etessami, K., Stewart, A., Yannakakis, M.: Polynomial-time algorithms for multi-type branching processes and stochastic context-free grammars. In: Proc. 44th ACM STOC 2012 (2012), full preprint on ArXiv:1201.2374Google Scholar
  10. 10.
    Etessami, K., Wojtczak, D., Yannakakis, M.: Recursive Stochastic Games with Positive Rewards. In: Aceto, L., et al. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 711–723. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Etessami, K., Yannakakis, M.: Recursive Markov Decision Processes and Recursive Stochastic Games. In: Caires, L., et al. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 891–903. Springer, Heidelberg (2005), see full version at, http://homepages.inf.ed.ac.uk/kousha/j_sub_rmdp_rssg.pdf; which includes also the results of our paper: Etessami, K., Yannakakis, M.: Efficient Qualitative Analysis of Classes of Recursive Markov Decision Processes and Simple Stochastic Games. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 634–645. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  12. 12.
    Etessami, K., Yannakakis, M.: Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. Journal of the ACM 56(1) (2009)Google Scholar
  13. 13.
    Harris, T.E.: The Theory of Branching Processes. Springer (1963)Google Scholar
  14. 14.
    Pliska, S.: Optimization of multitype branching processes. Management Sci. 23(2), 117–124 (1976/1977)Google Scholar
  15. 15.
    Puterman, M.L.: Markov Decision Processes. Wiley (1994)Google Scholar
  16. 16.
    Rothblum, U., Whittle, P.: Growth optimality for branching Markov decision chains. Math. Oper. Res. 7(4), 582–601 (1982)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kousha Etessami
    • 1
  • Alistair Stewart
    • 1
  • Mihalis Yannakakis
    • 2
  1. 1.School of InformaticsUniversity of EdinburghUK
  2. 2.Department of Computer ScienceColumbia UniversityUSA

Personalised recommendations