Abstract
Continuous-state POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, most existing parametric continuous-state POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switching-state dynamics model that can represent multi-modal state-dependent dynamics. We present the Switching Mode POMDP (SM-POMDP) planning algorithm for solving continuous-state POMDPs using this dynamics model. We also consider several procedures to approximate the value function as a mixture of a bounded number of Gaussians. Unlike the majority of prior work on approximate continuous-state POMDP planners, we provide a formal analysis of our SM-POMDP algorithm, providing bounds, where possible, on the quality of the resulting solution. We also analyze the computational complexity of SM-POMDP. Empirical results on an unmanned aerial vehicle collisions avoidance simulation, and a robot navigation simulation where the robot has faulty actuators, demonstrate the benefit of SM-POMDP over a prior parametric approach.
Similar content being viewed by others
References
Blackmore, L., Gil, S., Chung, S., Williams, B.: Model learning for switching linear systems with autonomous mode transitions. In: Proceedings of the IEEE Conference on Decision and Control (CDC) (2007)
Brooks, A., Makarenko, A., Williams, S., Durrant-Whyte, H.: Parametric POMDPs for planning in continuous state spaces. In: Robotics and Autonomous Systems (2006)
Burl, J.B.: Linear Optimal Control. Prentice Hall (1998)
Byl, K., Tedrake, R.: Dynamically diverse legged locomotion for rough terrain. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Video Submission (2009)
Fox, E.B., Sudderth, E.B., Jordan, M.I., Willsky, A.S.: Nonparametric Bayesian learning of switching linear dynamical systems. In: Advances in Neural Information Processing Systems (NIPS) (2009)
Ghahramani, Z., Hinton, G.: Variational learning for switching state–space models. Neural Comput. 12, 831–864 (2000)
Goldberger, J., Roweis, S.: Hierarchical clustering of a mixture model. In: Advances in Neural Information Processing Systems (NIPS) (2005)
Hershey, J., Olsen, P.: Approximating the Kullback Leibler divergence between Gaussian mixture models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2007)
Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998)
Kullback, S.: A lower bound for discrimination in terms of variation. IEEE Trans. Inf. Theory 13(1), 126–127 (1967)
Munos, R., Moore, A.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (1999)
Oh, S., Rehg, J., Balch, T., Dellaert, F.: Data-driven mcmc for learning and inference in switching linear dynamic systems. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2005)
Park, J., Sandberg, I.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246–257 (1991)
Pineau, J., Gordan, G., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: International Joint Conference on Artificial Intelligence (IJCAI) (2003)
Porta, J., Spaan, M., Vlassis, N., Poupart, P.: Point-based value iteration for continuous POMDPs. J. Mach. Learn. Res. 7, 2329–2367 (2006)
Shani, G., Brafman, R.I., Shimony, S.E.: Forward search value iteration for POMDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (2007)
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2004)
Smith, T., Simmons, R.: Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2005)
Sondik, E.J.: The optimal control of partially observable Markov processes. Ph.D. Thesis, Stanford University (1971)
Spaan, M., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2005)
Thrun, S.: Monte carlo POMDPs. In: Advances in Neural Information Processing Systems (NIPS) (2000)
Zhang, K., Kwok, J.: Simplifying mixture models through function approximation. In: Advances in Neural Information Processing Systems (NIPS) (2006)
Zhang, N., Zhang, W.: Speeding up the convergence of value iteration in partially observable Markov decision processes. J. Artif. Intell. Res. 14, 29–51 (2001)
Zhou, E., Fu, M.C., Marcus, S.I.: Solving continuous-state POMDPs vis density projection. IEEE Trans. Automat. Contr. 55(5), 1101–1116 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was conducted while E. Brunskill was at the Massachusetts Institute of Technology.
Rights and permissions
About this article
Cite this article
Brunskill, E., Kaelbling, L.P., Lozano-Pérez, T. et al. Planning in partially-observable switching-mode continuous domains. Ann Math Artif Intell 58, 185–216 (2010). https://doi.org/10.1007/s10472-010-9202-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-010-9202-1