A survey of point-based POMDP solvers

Shani, Guy; Pineau, Joelle; Kaplow, Robert

doi:10.1007/s10458-012-9200-2

A survey of point-based POMDP solvers

Published: 08 June 2012

Volume 27, pages 1–51, (2013)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Guy Shani¹,
Joelle Pineau² &
Robert Kaplow²

4181 Accesses
240 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Albore, A., Palacios, H., & Geffner, H. (2009). A translation-based approach to contingent planning. In International joint conference on artificial intelligence (IJCAI) (pp. 1623–1628).
Armstrong-Crews, N., Gordon, G., & Veloso, M. (2008). Solving POMDPs from both sides: Growing dual parsimonious bounds. In AAAI workshop for advancement in POMDP solvers.
Atrash A., Kaplow R., Villemure J., West R., Yamani H., Pineau J. (2009) Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics 1: 345–356
Article Google Scholar
Barto A. G., Bradtke S. J., Singh S. P. (1995) Learning to act using real-time dynamic programming. Artificial Intelligence 72: 81–138. doi:10.1016/0004-3702(94)00011-O
Article Google Scholar
Bellman R. (1957) Dynamic programming. Princeton University Press, Princeton
MATH Google Scholar
Bellman R. (1957) A Markovian decision process. Journal of Mathematics and Mechanics 6: 679–684
MATH Google Scholar
Bonet, B., & Geffner, H. (2003). Labeled RTDP: Improving the convergence of real-time dynamic programming. In International conference on planning and scheduling (ICAPS) (pp. 12–31).
Bonet, B., & Geffner, H. (2009). Solving POMDPs: RTDP-Bel vs. Point-based algorithms. In International joint conference on artificial intelligence (IJCAI) (pp. 1641–1646).
Boutilier, C. (2002). A POMDP formulation of preference elicitation problems. In National conference on artificial intelligence (AAAI) (pp. 239–246).
Brunskill, E., Kaelbling, L., Lozano-Perez, T., & Roy, N. (2008). Continuous-state POMDPs with hybrid dynamics. In International symposium on artificial intelligence and mathematics (ISAIM).
Cassandra, A., Littman, M. L., & Zhang, N. L. (1997). Incremental Pruning: A simple, fast, exact method for partially observable Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI) (pp. 54–61). http://www.cs.duke.edu/~mlittman/docs/uai97-pomdp.ps.
Dai, P., & Goldsmith, J. (2007). Topological value iteration algorithm for Markov decision processes. In: International joint conference on artificial intelligence (IJCAI) (pp. 1860–1865)
Dibangoye, J. S., Shani, G., Chaib-draa, B., & Mouaddib, A. I. (2009). Topological order planner for POMDPs. In International joint conference on artificial intelligence (IJCAI) (pp. 1684–1689).
Doshi, F., & Roy, N. (2008). The permutable POMDP: Fast solutions to POMDPs for preference elicitation. In International conference on autonomous agents and multiagent systems (AAMAS) (pp. 493–500).
Geffner, H., & Bonet, B. (1998). Solving large POMDPs using real time dynamic programming. In Proceedings AAAI fall symposium on POMDPs.
Hansen, E. (1998). Solving POMDPs by searching in policy space. In: Conference on uncertainty in artificial intelligence (UAI)(pp. 211–219).
Hansen, E. A. (2007). Indefinite-horizon POMDPs with action-based termination. In National conference on artificial intelligence (AAAI) (pp. 1237–1242).
Hauskrecht, M. (1997). Incremental methods for computing bounds in partially observable Markov decision processes. In: National conference on artificial intelligence (pp. 734–739).
Hauskrecht, M. (2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 13, 33–94. http://www.cs.washington.edu/research/jair/abstracts/hauskrecht00a.html.
Hauskrecht M., Fraser H. S. F. (2000) Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine 18(3): 221–244
Article Google Scholar
Hoey J., Poupart P., von Bertoldi A., Craig T., Boutilier C., Mihailidis A. (2010) Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding 114(5): 503–519
Article Google Scholar
Howard R. A. (1960) Dynamic programming and Markov processes. MIT Press, Cambridge, MA
MATH Google Scholar
Hsiao, K., Kaelbling, L. P., & Lozano-Pérez, T. (2007). Grasping POMDPs. In IEEE international conference on robotics and automation (ICRA) (pp. 4685–4692).
Huynh, V. A., & Roy N. (2009). icLQG: Combining local and global optimization for control in information space. In IEEE international conference on robotics and automation (ICRA) (pp. 2851–2858).
Izadi, M. T., Rajwade, A. V., & Precup, D. (2005). Using core beliefs for point-based value iteration. In International joint conference on artificial intelligence (pp. 1751–1753).
Izadi, M. T., Precup, D., & Azar, D. (2006). Belief selection in point-based planning algorithms for POMDPs. In Canadian conference on artificial intelligence (pp. 383–394).
Ji, S., Parr, R., Li, H., Liao, X., & Carin, L. (2007). Point-based policy iteration. In National conference on artificial intelligence (AAAI) (pp. 1243–1249). AAAI Press.
Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. In Artificial intelligence (pp. 99–134).
Kaplow, R. (2010). Point-based POMDP solvers: Survey and comparative analysis. Master’s thesis, McGill University.
Kurniawati, H., Hsu, D., & Lee, W. (2008). SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and systems (RSS).
Littman, M. L. (1996). Algorithms for sequential decision making. PhD thesis, Department of Computer Science, Brown University, Providence, RI. ftp://ftp.cs.brown.edu/pub/techreports/96/cs96-09.ps.Z. Also Technical Report CS-96-09.
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In International conference on machine learning (ICML) (pp. 362–370).
Littman, M. L., Sutton, R. S., & Singh, S. P. (2001). Predictive representations of state. In Advances in neural information processing systems (NIPS) (pp. 1555–1561).
Littman, M. L., Ravi, N., Fenson, E., & Howard, R. (2004). An instance-based state representation for network repair. In National conference on artificial intelligence (AAAI) (pp. 287–292).
Lovejoy W. S. (1991) Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1): 162–175
Article MathSciNet MATH Google Scholar
Ng, A., Harada, D., & Russell, S. (1999). Policy invariance underreward transformations: Theory and application to reward shaping. In International conference on machine learning (ICML).
Pineau, J., & Gordon, G. (2005). POMDP planning for robust robot control. In International symposium on robotics research (ISRR) (Vol. 28, pp. 69–82). Springer.
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International joint conference on artificial intelligence (pp. 1025–1032).
Pineau, J., Gordon, G. J., & Thrun, S. (2003). Applying metric-trees to belief-point POMDPs. In Advances in neural information processing systems (NIPS).
Pineau J., Gordon G. J., Thrun S. (2006) Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research (JAIR) 27: 335–380
MATH Google Scholar
Poon, L. (2001). A fast heuristic algorithm for decision theoretic planning. Master’s thesis, The Hong-Kong University of Science and Technology.
Porta J. M., Vlassis N., Spaan M. T. J., Poupart P. (2006) Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research 7: 2329–2367
MathSciNet MATH Google Scholar
Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis, Department of Computer Science, University of Toronto.
Poupart, P., & Boutilier, C. (2003). Bounded finite state controllers. In Advances in neural information processing systems (NIPS)
Poupart, P., Kim, K. E., & Kim, D. (2011). Closing the gap: Improved bounds on optimal POMDP solutions. In International conference on planning and scheduling (ICAPS).
Puterman M. L. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York, NY
Book MATH Google Scholar
Ross, S., & Chaib-draa, B. (2007). AEMS: An anytime online search algorithm for approximate policy refinement in large POMDPs. In International joint conference on artificial intelligence (IJCAI) (pp. 2592–2598).
Sanner, S., & Kersting, K. (2010). Symbolic dynamic programming for first-order POMDPs. In National conference on artificial intelligence (AAAI).
Shani G. (2010) Evaluating point-based POMDP solvers on multicore machines. IEEE Transactions on Systems, Man, and Cybernetics, Part B 40(4): 1062–1074
Article Google Scholar
Shani, G., & Meek, C. (2009). Improving existing fault recovery policies. In Advances in neural information processing systems (NIPS) (Vol. 22, pp. 1642–1650).
Shani G., Heckerman D., Brafman R. I. (2005) An MDP-based recommender system. Journal of Machine Learning Research 6: 1265–1295
MathSciNet MATH Google Scholar
Shani, G., Brafman, R., & Shimony, S. (2007). Forward search value iteration for POMDPs. In International joint conference on artificial intelligence (IJCAI).
Shani G., Brafman R. I., Shimony S. E. (2008) Prioritizing point-based POMDP solvers. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(6): 1592–1605
Article Google Scholar
Shani, G., Poupart, P., Brafman, R. I., & Shimony, S. E. (2008). Efficient ADD operations for point-based algorithms. In International conference on automated scheduling and planning (ICAPS) (pp. 330–337).
Sim, H. S., Kim, K. E., Kim, J. H., Chang, D. S., & Koo, M. W. (2008). Symbolic heuristic search value iteration for factored POMDPs. In National conference on artificial intelligence (pp. 1088–1093).
Singh S. P., Sutton R. S. (1996) Reinforcement learning with replacing eligibility traces. Machine Learning 22: 123–158
MATH Google Scholar
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In Conference on uncertainty in artificial intelligence (UAI).
Smith, T., & Simmons, R. G. (2005). Point-based POMDP algorithms: Improved analysis and implementation. In Conference on uncertainty in artificial intelligence (UAI) (pp. 542–547).
Sondik, E. (1971). The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University.
Sondik E. J. (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research 26: 282–304
Article MathSciNet MATH Google Scholar
Spaan, M., & Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. In IEEE international conference on robotics and automation (ICRA) (pp. 2399–2404).
Spaan M., Vlassis N. (2005) Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24: 195–220
MATH Google Scholar
Sutton R. S., Barto A. G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
Szepesvari, C. (2009). Reinforcement learning algorithms for MDPs—a survey. Technical report TR09-13, University Of Alberta.
Virin, Y., Shani, G., Shimony, S. E., & Brafman, R. I. (2007). Scaling up: Solving POMDPs through value based clustering. In: National conference on artificial intelligence (AAAI) (pp. 1290–1295).
Wang, C., & Khardon, R. (2010). Relational partially observable mdps. In National conference on artificial intelligence (AAAI).
Williams J. D., Young S. (2007) Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language 21(2): 393–422
Article Google Scholar
Wingate D., Seppi K. D. (2005) Prioritization methods for accelerating MDP solvers. Journal of Machine Learning Research (JMLR) 6: 851–881
MathSciNet MATH Google Scholar
Zhang N. L., Zhang S. (2001) Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research (JAIR) 14: 29–51
Google Scholar
Zilberstein S. (1996) Using anytime algorithms in intelligent systems. AI Magazine 17: 73–83
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems Engineering, Ben Gurion University, Beersheba, Israel
Guy Shani
School of Computer Science, McGill University, Montreal, QC, Canada
Joelle Pineau & Robert Kaplow

Authors

Guy Shani
View author publications
You can also search for this author in PubMed Google Scholar
Joelle Pineau
View author publications
You can also search for this author in PubMed Google Scholar
Robert Kaplow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guy Shani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shani, G., Pineau, J. & Kaplow, R. A survey of point-based POMDP solvers. Auton Agent Multi-Agent Syst 27, 1–51 (2013). https://doi.org/10.1007/s10458-012-9200-2

Download citation

Published: 08 June 2012
Issue Date: July 2013
DOI: https://doi.org/10.1007/s10458-012-9200-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of point-based POMDP solvers

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

An overview of machine learning techniques in constraint solving

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of point-based POMDP solvers

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

An overview of machine learning techniques in constraint solving

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation