Abstract
A partially-observable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. We consider several flavors of finite-horizon POMDPs. Our results concern the complexity of the policy evaluation and policy existence problems, which are characterized in terms of completeness for complexity classes.
We prove a new upper bound for the policy evaluation problem for POMDPs, showing it is complete for Probabilistic Logspace. From this, we prove policy existence problems for several variants of unobservable, succinctly represented MDPs to be complete for NPPP, a class for which not many natural problems are known to be complete.
Supported in part by the Office of the Vice Chancellor for Research and Graduate Studies at the University of Kentucky, and by the Deutsche Forschungsgemeinschaft (DFG), grant Mu 1226/2-1. Part of the work was done at University of Kentucky.
Supported in part by NSF grant CCR-9315354.
Supported in part by NSF grant 9509603. Portions of the work were performed while at the Institute of Mathematical Sciences, Chennai (Madras), India, and at the Wilhelm-Schickard Institut für Informatik, Universität Tübingen (supported by DFG grant TU 7/117-1).
Preview
Unable to display preview. Download preview PDF.
References
E. Allender and M. Ogihara. Relationships among PL, #L, and the determinant. RAIRO Theoretical Informatics and Applications, 30(1):1–21, 1996.
C. ÀIvarez and B. Jenner. A very hard log-space counting class. Theoretical Computer Science, 107:3–30, 1993.
J.L. Balcázar. The complexity of searching implicit graphs. Artificial Intelligence, 86:171–188, 1996.
J.L. Balcázar, A. Lozano, and J. Torán. The complexity of algorithmic problems on succinct instances. In R. Baeza-Yates and U. Manber, editors, Computer Science, pages 351–377. Plenum Press, 1992.
D. Beauquier, D. Burago, and A. Slissenko. On the complexity of finite memory policies for Markov decision processes. In Math. Foundations of Computer Science, pages 191–200. Lecture Notes in Computer Science #969, Springer-Verlag, 1995.
A. Borodin, S. Cook, and N. Pippenger. Parallel computation for well-endowed rings and space-bounded probabilistic machines. Information and Control, 58(13):113–136, 1983.
C. Boutilier and D. Poole. Computing optimal policies for partially observable decision processes using compact representations. In Proc. 13th National Conference on Artificial Intelligence, pages 1168–1175. AAAI Press / MIT Press, 1996.
T. Bylander. The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69:165–204, 1994.
K. Erol, J. Hendler, and D. Nau. Complexity results for hierarchical task-network planning. Annals of Mathematics and Artificial Intelligence, 1996.
K. Erol, D. Nau, and V. S. Subrahmanian. Complexity, decidability and undecidability results for domain-independent planning. Artificial Intelligence, 76:75–88, 1995.
S. Fenner, L. Fortnow, and S. Kurtz. Gap-definable counting classes. Journal of Computer and System Sciences, 48(1):116–148, 1994.
H. Galperin and A. Wigderson. Succinct representation of graphs. Information and Control, 56:183–198, 1983.
J. Goldsmith, M. Littman, and M. Mundhenk. The complexity of plan existence and evaluation in probabilistic domains. In Proc. 13th Conf. on Uncertainty in AI. Morgan Kaufmann Publishers, 1997.
J. Goldsmith, C. Lusena, and M. Mundhenk. The complexity of deterministically observable finite-horizon Markov decision processes. Technical Report 269-96, University of Kentucky Department of Computer Science, 1996.
H. Jung. On probabilistic time and space. In Proceedings 12th ICALP, pages 281–291. Lecture Notes in Computer Science #194, Springer-Verlag, 1985.
R. Ladner. Polynomial space counting problems. SIAM Journal on Computing, 18:1087–1097, 1989.
M.L. Littman. Probabilistic propositional planning: Representations and complexity. In Proc. 14th National Conference on AI. AAAI Press / MIT Press, 1997.
W.S. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28:47–66, 1991.
C.H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
C.H. Papadimitriou and J.N. Tsitsiklis. Intractable problems in control theory. SIAM Journal of Control and Optimization, pages 639–654, 1986.
C.H. Papadimitriou and J.N. Tsitsiklis. The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441–450, 1987.
M.L. Puterman. Markov decision processes. John Wiley & Sons, New York, 1994.
V. Vinay. Counting auxiliary pushdown automata and semi-unbounded arithmetic circuits. In Proc. 6th Structure in Complexity Theory Conference, pages 270–284. IEEE, 1991.
K. W. Wagner. The complexity of combinatorial problems with succinct input representation. Acta Informatica, 23:325–356, 1986.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mundhenk, M., Goldsmith, J., Allender, E. (1997). The complexity of policy evaluation for finite-horizon partially-observable Markov decision processes. In: Prívara, I., Ružička, P. (eds) Mathematical Foundations of Computer Science 1997. MFCS 1997. Lecture Notes in Computer Science, vol 1295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029956
Download citation
DOI: https://doi.org/10.1007/BFb0029956
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63437-9
Online ISBN: 978-3-540-69547-9
eBook Packages: Springer Book Archive