Abstract
Motion planning under uncertainty that can efficiently take into account changes in the environment is critical for robots to operate reliably in our living spaces. Partially Observable Markov Decision Process (POMDP) provides a systematic and general framework for motion planning under uncertainty. Point-based POMDP has advanced POMDP planning tremendously over the past few years, enabling POMDP planning to be practical for many simple to moderately difficult robotics problems. However, when environmental changes alter the POMDP model, most existing POMDP planners recompute the solution from scratch, often wasting significant computational resources that have been spent for solving the original problem. In this paper, we propose a novel algorithm, called Point-Based Policy Transformation (PBPT), that solves the altered POMDP problem by transforming the solution of the original problem to accommodate changes in the problem. PBPT uses the point-based POMDP approach. It transforms the original solution by modifying the set of sampled beliefs that represents the belief space B, and then uses this new set of sampled beliefs to revise the original solution. Preliminary results indicate that PBPT generates a good policy for the altered POMDP model in a matter of minutes, while recomputing the policy using the fastest offline POMDP planner today fails to find a policy with similar quality after two hours of planning time, even when the policy for the original problem is reused as an initial policy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2003)
Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: Monte Carlo Value Iteration for Continuous-State POMDPs. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 175–191. Springer, Heidelberg (2010)
van den Berg, J., Abbeel, P., Goldberg, K.: LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information. In: RSS (2010)
van den Berg, J., Overmars, M.: Roadmap-based motion planning in dynamic environments. IEEE TRO 21(5), 885–897 (2005)
de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer (2000)
Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (2002)
Hauser, K.: Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 193–209. Springer, Heidelberg (2010)
He, R., Brunskill, E., Roy, N.: PUMA: planning under uncertainty with macro-actions. In: AAAI (2010)
Jaillet, L., Siméon, T.: A PRM-based motion planner for dynamically changing environments. In: IROS (2004)
Kurniawati, H., Bandyopadhyay, T., Patrikalakis, N.M.: Global motion planning under uncertain motion, sensing, and environment map. In: RSS (2011)
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: RSS (2008)
Lamiraux, F., Bonnafous, D., Lefebvre, O.: Reactive path deformation for nonholonomic mobile robots. IEEE TRO 20(6), 967–977 (2004)
LaValle, S.M., Sharma, R.: On motion planning in changing, partially-predictable environments. IJRR 16(6), 775–805 (1997)
Leven, P., Hutchinson, S.: Real-time path planning in changing environments. IJRR 21(12), 999–1030 (2001)
Papadimitriou, C.H., Tsitsiklis, J.N.: The Complexity of Markov Decision Processes. Math. of Operation Research 12(3), 441–450 (1987)
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032 (2003)
Platt, R., Tedrake, R., Lozano-Perez, T., Kaelbling, L.P.: Belief space planning assuming maximum likelihood observations. In: RSS (2010)
Porta, J.M., Vlassis, N., Spaan, M.T.J., Poupart, P.: Point-Based Value Iteration for Continuous POMDPs. JMLR 7, 2329–2367 (2006)
Prentice, S., Roy, N.: The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance. In: ISRR (2007)
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: NIPS (2007)
Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. JAIR 32, 663–704 (2008)
Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation. In: UAI (July 2005)
Stentz, A.: The Focussed D* Algorithm for Real-Time Replanning. In: IJCAI (1995)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. JMLR 10(1), 1633–1685 (2009)
Thrun, S.: Monte carlo POMDPs. In: NIPS, pp. 1064–1070 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kurniawati, H., Patrikalakis, N.M. (2013). Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models. In: Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D. (eds) Algorithmic Foundations of Robotics X. Springer Tracts in Advanced Robotics, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36279-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-36279-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36278-1
Online ISBN: 978-3-642-36279-8
eBook Packages: EngineeringEngineering (R0)