The complexity of policy evaluation for finite-horizon partially-observable Markov decision processes

Mundhenk, Martin; Goldsmith, Judy; Allender, Eric

doi:10.1007/BFb0029956

Martin Mundhenk¹,
Judy Goldsmith² &
Eric Allender³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1295))

Included in the following conference series:

International Symposium on Mathematical Foundations of Computer Science

180 Accesses
4 Citations

Abstract

A partially-observable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. We consider several flavors of finite-horizon POMDPs. Our results concern the complexity of the policy evaluation and policy existence problems, which are characterized in terms of completeness for complexity classes.

We prove a new upper bound for the policy evaluation problem for POMDPs, showing it is complete for Probabilistic Logspace. From this, we prove policy existence problems for several variants of unobservable, succinctly represented MDPs to be complete for NP^PP, a class for which not many natural problems are known to be complete.

Supported in part by the Office of the Vice Chancellor for Research and Graduate Studies at the University of Kentucky, and by the Deutsche Forschungsgemeinschaft (DFG), grant Mu 1226/2-1. Part of the work was done at University of Kentucky.

Supported in part by NSF grant CCR-9315354.

Supported in part by NSF grant 9509603. Portions of the work were performed while at the Institute of Mathematical Sciences, Chennai (Madras), India, and at the Wilhelm-Schickard Institut für Informatik, Universität Tübingen (supported by DFG grant TU 7/117-1).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E. Allender and M. Ogihara. Relationships among PL, #L, and the determinant. RAIRO Theoretical Informatics and Applications, 30(1):1–21, 1996.
Google Scholar
C. ÀIvarez and B. Jenner. A very hard log-space counting class. Theoretical Computer Science, 107:3–30, 1993.
Article Google Scholar
J.L. Balcázar. The complexity of searching implicit graphs. Artificial Intelligence, 86:171–188, 1996.
Article Google Scholar
J.L. Balcázar, A. Lozano, and J. Torán. The complexity of algorithmic problems on succinct instances. In R. Baeza-Yates and U. Manber, editors, Computer Science, pages 351–377. Plenum Press, 1992.
Google Scholar
D. Beauquier, D. Burago, and A. Slissenko. On the complexity of finite memory policies for Markov decision processes. In Math. Foundations of Computer Science, pages 191–200. Lecture Notes in Computer Science #969, Springer-Verlag, 1995.
Google Scholar
A. Borodin, S. Cook, and N. Pippenger. Parallel computation for well-endowed rings and space-bounded probabilistic machines. Information and Control, 58(13):113–136, 1983.
Article Google Scholar
C. Boutilier and D. Poole. Computing optimal policies for partially observable decision processes using compact representations. In Proc. 13th National Conference on Artificial Intelligence, pages 1168–1175. AAAI Press / MIT Press, 1996.
Google Scholar
T. Bylander. The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69:165–204, 1994.
Article Google Scholar
K. Erol, J. Hendler, and D. Nau. Complexity results for hierarchical task-network planning. Annals of Mathematics and Artificial Intelligence, 1996.
Google Scholar
K. Erol, D. Nau, and V. S. Subrahmanian. Complexity, decidability and undecidability results for domain-independent planning. Artificial Intelligence, 76:75–88, 1995.
Article MathSciNet Google Scholar
S. Fenner, L. Fortnow, and S. Kurtz. Gap-definable counting classes. Journal of Computer and System Sciences, 48(1):116–148, 1994.
Article Google Scholar
H. Galperin and A. Wigderson. Succinct representation of graphs. Information and Control, 56:183–198, 1983.
Article Google Scholar
J. Goldsmith, M. Littman, and M. Mundhenk. The complexity of plan existence and evaluation in probabilistic domains. In Proc. 13th Conf. on Uncertainty in AI. Morgan Kaufmann Publishers, 1997.
Google Scholar
J. Goldsmith, C. Lusena, and M. Mundhenk. The complexity of deterministically observable finite-horizon Markov decision processes. Technical Report 269-96, University of Kentucky Department of Computer Science, 1996.
Google Scholar
H. Jung. On probabilistic time and space. In Proceedings 12th ICALP, pages 281–291. Lecture Notes in Computer Science #194, Springer-Verlag, 1985.
Google Scholar
R. Ladner. Polynomial space counting problems. SIAM Journal on Computing, 18:1087–1097, 1989.
Article Google Scholar
M.L. Littman. Probabilistic propositional planning: Representations and complexity. In Proc. 14th National Conference on AI. AAAI Press / MIT Press, 1997.
Google Scholar
W.S. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28:47–66, 1991.
Article Google Scholar
C.H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
Google Scholar
C.H. Papadimitriou and J.N. Tsitsiklis. Intractable problems in control theory. SIAM Journal of Control and Optimization, pages 639–654, 1986.
Google Scholar
C.H. Papadimitriou and J.N. Tsitsiklis. The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441–450, 1987.
Google Scholar
M.L. Puterman. Markov decision processes. John Wiley & Sons, New York, 1994.
Google Scholar
V. Vinay. Counting auxiliary pushdown automata and semi-unbounded arithmetic circuits. In Proc. 6th Structure in Complexity Theory Conference, pages 270–284. IEEE, 1991.
Google Scholar
K. W. Wagner. The complexity of combinatorial problems with succinct input representation. Acta Informatica, 23:325–356, 1986.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universität Trier, FB IV - Informatik, D-54286, Trier, Germany
Martin Mundhenk
Dept. of Computer Science, University of Kentucky, 40506-0046, Lexington, KY
Judy Goldsmith
Dept. of Computer Science, Rutgers University, 08855-1179, Piscataway, NJ
Eric Allender

Authors

Martin Mundhenk
View author publications
You can also search for this author in PubMed Google Scholar
Judy Goldsmith
View author publications
You can also search for this author in PubMed Google Scholar
Eric Allender
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Igor Prívara Peter Ružička

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mundhenk, M., Goldsmith, J., Allender, E. (1997). The complexity of policy evaluation for finite-horizon partially-observable Markov decision processes. In: Prívara, I., Ružička, P. (eds) Mathematical Foundations of Computer Science 1997. MFCS 1997. Lecture Notes in Computer Science, vol 1295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029956

Download citation

DOI: https://doi.org/10.1007/BFb0029956
Published: 17 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63437-9
Online ISBN: 978-3-540-69547-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics