Skip to main content
Log in

A survey of point-based POMDP solvers

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Albore, A., Palacios, H., & Geffner, H. (2009). A translation-based approach to contingent planning. In International joint conference on artificial intelligence (IJCAI) (pp. 1623–1628).

  2. Armstrong-Crews, N., Gordon, G., & Veloso, M. (2008). Solving POMDPs from both sides: Growing dual parsimonious bounds. In AAAI workshop for advancement in POMDP solvers.

  3. Atrash A., Kaplow R., Villemure J., West R., Yamani H., Pineau J. (2009) Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics 1: 345–356

    Article  Google Scholar 

  4. Barto A. G., Bradtke S. J., Singh S. P. (1995) Learning to act using real-time dynamic programming. Artificial Intelligence 72: 81–138. doi:10.1016/0004-3702(94)00011-O

    Article  Google Scholar 

  5. Bellman R. (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  6. Bellman R. (1957) A Markovian decision process. Journal of Mathematics and Mechanics 6: 679–684

    MATH  Google Scholar 

  7. Bonet, B., & Geffner, H. (2003). Labeled RTDP: Improving the convergence of real-time dynamic programming. In International conference on planning and scheduling (ICAPS) (pp. 12–31).

  8. Bonet, B., & Geffner, H. (2009). Solving POMDPs: RTDP-Bel vs. Point-based algorithms. In International joint conference on artificial intelligence (IJCAI) (pp. 1641–1646).

  9. Boutilier, C. (2002). A POMDP formulation of preference elicitation problems. In National conference on artificial intelligence (AAAI) (pp. 239–246).

  10. Brunskill, E., Kaelbling, L., Lozano-Perez, T., & Roy, N. (2008). Continuous-state POMDPs with hybrid dynamics. In International symposium on artificial intelligence and mathematics (ISAIM).

  11. Cassandra, A., Littman, M. L., & Zhang, N. L. (1997). Incremental Pruning: A simple, fast, exact method for partially observable Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI) (pp. 54–61). http://www.cs.duke.edu/~mlittman/docs/uai97-pomdp.ps.

  12. Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In: International joint conference on artificial intelligence (IJCAI) (pp. 1860–1865)

  13. Dibangoye, J. S., Shani, G., Chaib-draa, B., & Mouaddib, A. I. (2009). Topological order planner for POMDPs. In International joint conference on artificial intelligence (IJCAI) (pp. 1684–1689).

  14. Doshi, F., & Roy, N. (2008). The permutable POMDP: Fast solutions to POMDPs for preference elicitation. In International conference on autonomous agents and multiagent systems (AAMAS) (pp. 493–500).

  15. Geffner, H., & Bonet, B. (1998). Solving large POMDPs using real time dynamic programming. In Proceedings AAAI fall symposium on POMDPs.

  16. Hansen, E. (1998). Solving POMDPs by searching in policy space. In: Conference on uncertainty in artificial intelligence (UAI)(pp. 211–219).

  17. Hansen, E. A. (2007). Indefinite-horizon POMDPs with action-based termination. In National conference on artificial intelligence (AAAI) (pp. 1237–1242).

  18. Hauskrecht, M. (1997). Incremental methods for computing bounds in partially observable Markov decision processes. In: National conference on artificial intelligence (pp. 734–739).

  19. Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 13, 33–94. http://www.cs.washington.edu/research/jair/abstracts/hauskrecht00a.html.

  20. Hauskrecht M., Fraser H. S. F. (2000) Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine 18(3): 221–244

    Article  Google Scholar 

  21. Hoey J., Poupart P., von Bertoldi A., Craig T., Boutilier C., Mihailidis A. (2010) Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding 114(5): 503–519

    Article  Google Scholar 

  22. Howard R. A. (1960) Dynamic programming and Markov processes. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  23. Hsiao, K., Kaelbling, L. P., & Lozano-Pérez, T. (2007). Grasping POMDPs. In IEEE international conference on robotics and automation (ICRA) (pp. 4685–4692).

  24. Huynh, V. A., & Roy N. (2009). icLQG: Combining local and global optimization for control in information space. In IEEE international conference on robotics and automation (ICRA) (pp. 2851–2858).

  25. Izadi, M. T., Rajwade, A. V., & Precup, D. (2005). Using core beliefs for point-based value iteration. In International joint conference on artificial intelligence (pp. 1751–1753).

  26. Izadi, M. T., Precup, D., & Azar, D. (2006). Belief selection in point-based planning algorithms for POMDPs. In Canadian conference on artificial intelligence (pp. 383–394).

  27. Ji, S., Parr, R., Li, H., Liao, X., & Carin, L. (2007). Point-based policy iteration. In National conference on artificial intelligence (AAAI) (pp. 1243–1249). AAAI Press.

  28. Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. In Artificial intelligence (pp. 99–134).

  29. Kaplow, R. (2010). Point-based POMDP solvers: Survey and comparative analysis. Master’s thesis, McGill University.

  30. Kurniawati, H., Hsu, D., & Lee, W. (2008). SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and systems (RSS).

  31. Littman, M. L. (1996). Algorithms for sequential decision making. PhD thesis, Department of Computer Science, Brown University, Providence, RI. ftp://ftp.cs.brown.edu/pub/techreports/96/cs96-09.ps.Z. Also Technical Report CS-96-09.

  32. Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In International conference on machine learning (ICML) (pp. 362–370).

  33. Littman, M. L., Sutton, R. S., & Singh, S. P. (2001). Predictive representations of state. In Advances in neural information processing systems (NIPS) (pp. 1555–1561).

  34. Littman, M. L., Ravi, N., Fenson, E., & Howard, R. (2004). An instance-based state representation for network repair. In National conference on artificial intelligence (AAAI) (pp. 287–292).

  35. Lovejoy W. S. (1991) Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1): 162–175

    Article  MathSciNet  MATH  Google Scholar 

  36. Ng, A., Harada, D., & Russell, S. (1999). Policy invariance underreward transformations: Theory and application to reward shaping. In International conference on machine learning (ICML).

  37. Pineau, J., & Gordon, G. (2005). POMDP planning for robust robot control. In International symposium on robotics research (ISRR) (Vol. 28, pp. 69–82). Springer.

  38. Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International joint conference on artificial intelligence (pp. 1025–1032).

  39. Pineau, J., Gordon, G. J., & Thrun, S. (2003). Applying metric-trees to belief-point POMDPs. In Advances in neural information processing systems (NIPS).

  40. Pineau J., Gordon G. J., Thrun S. (2006) Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research (JAIR) 27: 335–380

    MATH  Google Scholar 

  41. Poon, L. (2001). A fast heuristic algorithm for decision theoretic planning. Master’s thesis, The Hong-Kong University of Science and Technology.

  42. Porta J. M., Vlassis N., Spaan M. T. J., Poupart P. (2006) Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research 7: 2329–2367

    MathSciNet  MATH  Google Scholar 

  43. Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, Department of Computer Science, University of Toronto.

  44. Poupart, P., & Boutilier, C. (2003). Bounded finite state controllers. In Advances in neural information processing systems (NIPS)

  45. Poupart, P., Kim, K. E., & Kim, D. (2011). Closing the gap: Improved bounds on optimal POMDP solutions. In International conference on planning and scheduling (ICAPS).

  46. Puterman M. L. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York, NY

    Book  MATH  Google Scholar 

  47. Ross, S., & Chaib-draa, B. (2007). AEMS: An anytime online search algorithm for approximate policy refinement in large POMDPs. In International joint conference on artificial intelligence (IJCAI) (pp. 2592–2598).

  48. Sanner, S., & Kersting, K. (2010). Symbolic dynamic programming for first-order POMDPs. In National conference on artificial intelligence (AAAI).

  49. Shani G. (2010) Evaluating point-based POMDP solvers on multicore machines. IEEE Transactions on Systems, Man, and Cybernetics, Part B 40(4): 1062–1074

    Article  Google Scholar 

  50. Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Advances in neural information processing systems (NIPS) (Vol. 22, pp. 1642–1650).

  51. Shani G., Heckerman D., Brafman R. I. (2005) An MDP-based recommender system. Journal of Machine Learning Research 6: 1265–1295

    MathSciNet  MATH  Google Scholar 

  52. Shani, G., Brafman, R., & Shimony, S. (2007). Forward search value iteration for POMDPs. In International joint conference on artificial intelligence (IJCAI).

  53. Shani G., Brafman R. I., Shimony S. E. (2008) Prioritizing point-based POMDP solvers. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(6): 1592–1605

    Article  Google Scholar 

  54. Shani, G., Poupart, P., Brafman, R. I., & Shimony, S. E. (2008). Efficient ADD operations for point-based algorithms. In International conference on automated scheduling and planning (ICAPS) (pp. 330–337).

  55. Sim, H. S., Kim, K. E., Kim, J. H., Chang, D. S., & Koo, M. W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In National conference on artificial intelligence (pp. 1088–1093).

  56. Singh S. P., Sutton R. S. (1996) Reinforcement learning with replacing eligibility traces. Machine Learning 22: 123–158

    MATH  Google Scholar 

  57. Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In Conference on uncertainty in artificial intelligence (UAI).

  58. Smith, T., & Simmons, R. G. (2005). Point-based POMDP algorithms: Improved analysis and implementation. In Conference on uncertainty in artificial intelligence (UAI) (pp. 542–547).

  59. Sondik, E. (1971). The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University.

  60. Sondik E. J. (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research 26: 282–304

    Article  MathSciNet  MATH  Google Scholar 

  61. Spaan, M., & Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. In IEEE international conference on robotics and automation (ICRA) (pp. 2399–2404).

  62. Spaan M., Vlassis N. (2005) Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24: 195–220

    MATH  Google Scholar 

  63. Sutton R. S., Barto A. G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA

    Google Scholar 

  64. Szepesvari, C. (2009). Reinforcement learning algorithms for MDPs—a survey. Technical report TR09-13, University Of Alberta.

  65. Virin, Y., Shani, G., Shimony, S. E., & Brafman, R. I. (2007). Scaling up: Solving POMDPs through value based clustering. In: National conference on artificial intelligence (AAAI) (pp. 1290–1295).

  66. Wang, C., & Khardon, R. (2010). Relational partially observable mdps. In National conference on artificial intelligence (AAAI).

  67. Williams J. D., Young S. (2007) Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language 21(2): 393–422

    Article  Google Scholar 

  68. Wingate D., Seppi K. D. (2005) Prioritization methods for accelerating MDP solvers. Journal of Machine Learning Research (JMLR) 6: 851–881

    MathSciNet  MATH  Google Scholar 

  69. Zhang N. L., Zhang S. (2001) Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research (JAIR) 14: 29–51

    Google Scholar 

  70. Zilberstein S. (1996) Using anytime algorithms in intelligent systems. AI Magazine 17: 73–83

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guy Shani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shani, G., Pineau, J. & Kaplow, R. A survey of point-based POMDP solvers. Auton Agent Multi-Agent Syst 27, 1–51 (2013). https://doi.org/10.1007/s10458-012-9200-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-012-9200-2

Keywords

Navigation