Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations
We show that one can approximate the least fixed point solution for a multivariate system of monotone probabilistic max (min) polynomial equations, in time polynomial in both the encoding size of the system of equations and in log(1/ε), where ε > 0 is the desired additive error bound of the solution. (The model of computation is the standard Turing machine model.)
These equations form the Bellman optimality equations for several important classes of infinite-state Markov Decision Processes (MDPs). Thus, as a corollary, we obtain the first polynomial time algorithms for computing to within arbitrary desired precision the optimal value vector for several classes of infinite-state MDPs which arise as extensions of classic, and heavily studied, purely stochastic processes. These include both the problem of maximizing and minimizing the termination (extinction) probability of multi-type branching MDPs, stochastic context-free MDPs, and 1-exit Recursive MDPs. We also show that we can compute in P-time an ε-optimal policy for any given desired ε > 0.
KeywordsOptimal Policy Markov Decision Process Stochastic Game Bellman Equation Full Version
Unable to display preview. Download preview PDF.
- 7.Esparza, J., Kučera, A., Mayr, R.: Model checking probabilistic pushdown automata. Logical Methods in Computer Science 2(1), 1–31 (2006)Google Scholar
- 8.Etessami, K., Stewart, A., Yannakakis, M.: Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations. Preprint of the full version of this paper on ArXiv: 1202.4798Google Scholar
- 9.Etessami, K., Stewart, A., Yannakakis, M.: Polynomial-time algorithms for multi-type branching processes and stochastic context-free grammars. In: Proc. 44th ACM STOC 2012 (2012), full preprint on ArXiv:1201.2374Google Scholar
- 11.Etessami, K., Yannakakis, M.: Recursive Markov Decision Processes and Recursive Stochastic Games. In: Caires, L., et al. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 891–903. Springer, Heidelberg (2005), see full version at, http://homepages.inf.ed.ac.uk/kousha/j_sub_rmdp_rssg.pdf; which includes also the results of our paper: Etessami, K., Yannakakis, M.: Efficient Qualitative Analysis of Classes of Recursive Markov Decision Processes and Simple Stochastic Games. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 634–645. Springer, Heidelberg (2006) CrossRefGoogle Scholar
- 12.Etessami, K., Yannakakis, M.: Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. Journal of the ACM 56(1) (2009)Google Scholar
- 13.Harris, T.E.: The Theory of Branching Processes. Springer (1963)Google Scholar
- 14.Pliska, S.: Optimization of multitype branching processes. Management Sci. 23(2), 117–124 (1976/1977)Google Scholar
- 15.Puterman, M.L.: Markov Decision Processes. Wiley (1994)Google Scholar