A Fast Elimination Method for Pruning in POMDPs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9904)

Abstract

This paper aims to speed up the pruning procedure that is encountered in the exact value iteration in POMDPs. The value function in POMDPs can be represented by a finite set of vectors over the state space. In each step of the exact value iteration algorithm, the number of possible vectors increases linearly with the cardinality of the action set and exponentially with the cardinality of the observation set. This set of vectors should be pruned to a minimal subset retaining the same value function over the state space. Therefore, pruning procedure in general is the bottleneck of finding the optimal policy for POMDPs. This paper analyses two different linear programming methods, the classical Lark’s algorithm and the recently proposed Skyline algorithm for detecting these useless vectors. We claim that using the information about the support region of the vectors that have already been processed, both algorithms can be drastically improved. We present comparative experiments on both randomly generated problems and POMDP benchmarks.

Keywords

Linear programming POMDP Pruning 

References

  1. 1.
    Cassandra, A.: Tony’s POMDP file repository page (1999). http://www.cs.brown.edu/research/ai/pomdp/examples/index.html
  2. 2.
    Cassandra, A., Littman, M.L., Zhang, N.L.: Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, pp. 54–61. Morgan Kaufmann Publishers Inc. (1997)Google Scholar
  3. 3.
    Cassandra, A.R.: Exact and approximate algorithms for partially observable Markov decision processes. Brown University (1998)Google Scholar
  4. 4.
    Feng, Z., Zilberstein, S.: Region-based incremental pruning for POMDPs. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 146–153. AUAI Press (2004)Google Scholar
  5. 5.
    Harris, P.M.: Pivot selection methods of the Devex LP code. Math. program. 5(1), 1–28 (1973)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Hauskrecht, M.: Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13, 33–94 (2000)MathSciNetMATHGoogle Scholar
  7. 7.
    Hauskrecht, M., Fraser, H.: Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artif. Intell. Med. 18(3), 221–244 (2000)CrossRefGoogle Scholar
  8. 8.
    Hero, A.O., Castanon, D., Cochran, D., Kastella, K.: Foundations and Applications of Sensor Management. Springer Science & Business Media, New York (2007)Google Scholar
  9. 9.
    Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., Mihailidis, A.: Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Comput. Vis. Image Underst. 114(5), 503–519 (2010)CrossRefGoogle Scholar
  10. 10.
    Littman, M.L.: The Witness algorithm: solving partially observable Markov decision processes. Brown University, Providence (1994)Google Scholar
  11. 11.
    Mallick, M., Krishnamurthy, V., Vo, B.N.: Integrated Tracking, Classification, and Sensor Management: Theory and Applications. Wiley, Hoboken (2012)Google Scholar
  12. 12.
    Monahan, G.E.: State of the art - a survey of partially observable Markov decision processes: theory, models, and algorithms. Manage. Sci. 28(1), 1–16 (1982)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Raphael, C., Shani, G.: The Skyline algorithm for POMDP value function pruning. Ann. Math. Artif. Intell. 65(1), 61–77 (2012)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)CrossRefMATHGoogle Scholar
  15. 15.
    Temizer, S., Kochenderfer, M.J., Kaelbling, L.P., Lozano-Pérez, T., Kuchar, J.K.: Collision avoidance for unmanned aircraft using Markov decision processes. In: AIAA Guidance, Navigation, and Control Conference, Toronto, Canada (2010)Google Scholar
  16. 16.
    Zhang, N.L., Liu, W.: Planning in stochastic domains: problem characteristics and approximation. Technical report HKUST-CS96-31, Department of Computer Science, Hong Kong University of Science and Technology (1996)Google Scholar
  17. 17.
    Zhang, N.L., Zhang, W.: Speeding up the convergence of value iteration in partially observable Markov decision processes. J. Artif. Intell. Res. 14, 29–51 (2001)MathSciNetGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Electrical and Electronics EngineeringMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations