Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Partially Observable Markov Decision Processes

  • Pascal Poupart
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_629



A partially observable Markov decision process (POMDP) refers to a class of sequential decision-making problems under uncertainty. This class includes problems with partially observable states and uncertain action effects. A POMDP is formally defined by a tuple \(\langle \mathcal{S},\ \mathcal{A},\ \mathcal{O},\ T,\ Z,\ R,\ {b}_{0},\ h,\ \gamma \rangle\)

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policygradient methods for POMDPs. In International Conference on Machine Learning, pp. 3–10.Google Scholar
  2. Amato, C., Bernstein, D. S., & Zilberstein, S. (2009). Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Journal of Autonomous Agents and Multi-Agent Systems, 21, 293–320.CrossRefGoogle Scholar
  3. Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In International Joint Conferences on Artificial Intelligence, pp. 2418–2424.Google Scholar
  4. Aström, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–2005.MathSciNetMATHCrossRefGoogle Scholar
  5. Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1168–1175Google Scholar
  6. Buede, D. M. (1999). Dynamic decision networks: An approach for solving the dual control problem. Cincinnati: Spring INFORMS.Google Scholar
  7. Drake, A. (1962). Observation of a Markov Process through a noisy channel. PhD thesis, Massachusetts Institute of Technology.Google Scholar
  8. Hansen, E. (1997). An improved policy iteration algorithm for partially observable MDPs. In Neural Information Processing Systems, pp. 1015–1021.Google Scholar
  9. Hauskrecht, M., & Fraser, H. S. F. (2010). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221–244.CrossRefGoogle Scholar
  10. Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., & Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable markov decision process. Computer Vision and Image Understanding, 114, 503–519.CrossRefGoogle Scholar
  11. Kaelbling, L. P., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.MathSciNetMATHCrossRefGoogle Scholar
  12. Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Uncertainty in Artificial Intelligence, pp. 427–436.Google Scholar
  13. Pineau, J. & Gordon, G. (2005). POMDP planning for robust robot control. In International Symposium on Robotics Research, pp. 69–82.Google Scholar
  14. Pineau, J., Gordon, G. J., & Thrun, S. (2003). Policy-contingent abstraction for robust robot control. In Uncertainty in Artificial Intelligence, pp. 477–484.Google Scholar
  15. Pineau, J., Gordon, G., & Thrun, S. (2006). Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research, 27, 335–380.MATHGoogle Scholar
  16. Piotr, J. (2005). Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24, 49–79.Google Scholar
  17. Porta, J. M., Vlassis, N. A., Spaan, M. T. J., & Poupart, P. (2006). Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research, 7, 2329–2367.MathSciNetGoogle Scholar
  18. Poupart, P., & Boutilier, C. (2004). VDCBPI: An approximate scalable algorithm for large POMDPs. In Neural Information Processing Systems, pp. 1081–1088.Google Scholar
  19. Poupart, P., & Vlassis, N. (2008). Model-based Bayesian reinforcement learning in partially observable domains. In International Symposium on Artificial Intelligence and Mathematics (ISAIM).Google Scholar
  20. Puterman, M. L. (1994). Markov decision processes. New York: Wiley.MATHCrossRefGoogle Scholar
  21. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.CrossRefGoogle Scholar
  22. Ross, S., Chaib-Draa, B., & Pineau, J. (2007). Bayes-adaptive POMDPs. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
  23. Ross, S., Pineau, J., Paquet, S., & Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 32, 663–704.MathSciNetMATHGoogle Scholar
  24. Roy, N., Gordon, G. J., & Thrun, S. (2005). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research, 23, 1–40.MATHCrossRefGoogle Scholar
  25. Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Neural Information Processing Systems.Google Scholar
  26. Shani, G., Brafman, R. I., Shimony, S. E., & Poupart, P. (2008). Efficient ADD operations for point-based algorithms. In International Conference on Automated Planning and Scheduling, pp. 330–337.Google Scholar
  27. Sim, H. S., Kim, K.-E., Kim, J. H., Chang, D.-S., & Koo, M.-W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In Twenty-Third National Conference on Artificial Intelligence (AAAI), pp. 1088–1093.Google Scholar
  28. Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21, 1071–1088.MATHCrossRefGoogle Scholar
  29. Theocharous, G., & Mahadevan, S. (2002). Approximate planning with hierarchical partially observable Markov decision process models for robot navigation. In IEEE International Conference on Robotics and Automation, pp. 1347–1352.Google Scholar
  30. Thomson, B., & Young, S. (2010). Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech & Language, 24, 562–588.CrossRefGoogle Scholar
  31. Toussaint, M., Charlin, L., & Poupart, P. (2008). Hierarchical POMDP controller optimization by likelihood maximization. In Uncertainty in Artificial Intelligence, pp. 562–570.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Pascal Poupart
    • 1
  1. 1.University of WaterlooWaterlooCanada