Global motion planning under uncertain motion, sensing, and environment map

Kurniawati, Hanna; Bandyopadhyay, Tirthankar; Patrikalakis, Nicholas M.

doi:10.1007/s10514-012-9307-y

Global motion planning under uncertain motion, sensing, and environment map

Published: 23 June 2012

Volume 33, pages 255–272, (2012)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Hanna Kurniawati¹,
Tirthankar Bandyopadhyay² &
Nicholas M. Patrikalakis³

1326 Accesses
51 Citations
Explore all metrics

Abstract

Uncertainty in motion planning is often caused by three main sources: motion error, sensing error, and imperfect environment map. Despite the significant effect of all three sources of uncertainty to motion planning problems, most planners take into account only one or at most two of them. We propose a new motion planner, called Guided Cluster Sampling (GCS), that takes into account all three sources of uncertainty for robots with active sensing capabilities. GCS uses the Partially Observable Markov Decision Process (POMDP) framework and the point-based POMDP approach. Although point-based POMDPs have shown impressive progress over the past few years, it performs poorly when the environment map is imperfect. This poor performance is due to the extremely high dimensional state space, which translates to the extremely large belief space B.

We alleviate this problem by constructing a more suitable sampling distribution based on the observations that when the robot has active sensing capability, B can be partitioned into a collection of much smaller sub-spaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. Utilizing these observations, GCS samples B in two-stages, a subspace is sampled from the collection and then a belief is sampled from the subspace.

It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Simulation results on marine robotics scenarios suggest that GCS can generate reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDPs today. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, given enough time, GCS converges to the optimal policy.

To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

An Online POMDP Solver for Uncertainty Planning in Dynamic Environment

Importance Sampling for Online Planning under Uncertainty

Notes

The proofs of all theorems are in Appendix.
See footnote 1.
See footnote 1.

References

Alterovitz, R., Simeon, T., & Goldberg, K. (2007). The stochastic motion roadmap: a sampling framework for planning with Markov motion uncertainty. In RSS.
Google Scholar
Bai, H., Hsu, D., Lee, W. S., & Ngo, A. V. (2010). Monte Carlo Value Iteration for continuous-state POMDPs. In WAFR.
Google Scholar
Bandyopadhyay, T., Sarcione, L., & Hover, F. (2009). A simple reactive obstacle avoidance algorithm and its application in Singapore Harbour. In FSR.
Google Scholar
Berg, J. V. D., Abbeel, P., & Goldberg, K. (2010). LQG-MP: optimized path planning for robots with motion uncertainty and imperfect state information. In RSS.
Google Scholar
Burns, B., & Brock, O. (2007). Sampling-based motion planning with sensing uncertainty. In ICRA.
Google Scholar
Choset, H., Lynch, K. M., Hutchinson, S., Kantor, G., Burgard, W., Kavraki, L. E., & Thrun, S. (2005). Principles of robot motion: Theory, algorithms, and implementations. Cambridge: MIT Press.
MATH Google Scholar
Guez, A., & Pineau, J. (2010). Multi-tasking SLAM. In ICRA.
Google Scholar
Guibas, L., Hsu, D., Kurniawati, H., & Rehman, E. (2008). Bounded uncertainty roadmaps for path planning. In WAFR.
Google Scholar
Hauser, K. (2010). Randomized belief-space replanning in partially-observable continuous spaces. In WAFR.
Google Scholar
Hsiao, K., Kaelbling, L. P., & Lozano-Perez, T. (2007). Grasping POMDPs. In ICRA (pp. 4685–4692).
Google Scholar
Hsu, D., Latombe, J.-C., & Motwani, R. (1999). Path planning in expansive configuration spaces. International Journal of Computational Geometry and Applications, 9(4–5), 495–512.
MathSciNet Google Scholar
Hsu, D., Lee, W. S., & Rong, N. (2007). What makes some POMDP problems easy to approximate? In NIPS.
Google Scholar
Kollar, T., & Roy, N. (2008). Efficient optimization of information-theoretic exploration in SLAM. In AAAI (pp. 1369–1375).
Google Scholar
Kurniawati, H., Du, Y., Hsu, D., & Lee, W. S. (2011). Motion planning under uncertainty for robotic tasks with long time horizons. The International Journal of Robotics Research, 30(3), 308–323.
Article Google Scholar
Kurniawati, H., Hsu, D., & Lee, W. S. (2008). SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In RSS.
Google Scholar
Missiuro, P., & Roy, N. (2006). Adapting probabilistic roadmaps to handle uncertain maps. In ICRA.
Google Scholar
Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.
Article MathSciNet MATH Google Scholar
Papadopoulos, G., Kurniawati, H., Shariff, A. S. B. M., Wong, L. J., & Patrikalakis, N. M. (2011). 3D-surface reconstruction for partially submerged marine structures using an autonomous surface vehicle. In IROS.
Google Scholar
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based Value Iteration: an anytime algorithm for POMDPs. In IJCAI (pp. 1025–1032).
Google Scholar
Plaku, E., Bekris, K., Chen, B. Y., Ladd, A. M., & Kavraki, L. E. (2005). Sampling-based roadmap of trees for parallel motion planning. IEEE Transactions on Robotics, 21(4), 597–608.
Article Google Scholar
Porta, J. M., Vlassis, N., Spaan, M. T. J., & Poupart, P. (2006). Point-Based Value Iteration for continuous POMDPs. Journal of Machine Learning Research, 7(Nov), 2329–2367.
MathSciNet MATH Google Scholar
Prentice, S., & Roy, N. (2007). The Belief Roadmap: Efficient planning in linear POMDPs by factoring the covariance. In ISRR.
Google Scholar
Smith, T., & Simmons, R. (2005). Point-based POMDP algorithms: Improved analysis and implementation. In UAI.
Google Scholar
Stachniss, C., Grisetti, G., & Burgard, W. (2005). Information gain-based exploration using Rao-Blackwellized particle filters. In RSS (pp. 65–72).
Google Scholar
Thrun, S. (2000). Monte Carlo POMDPs. In S. A. Solla, T. K. Leen & K.-R. Müller (Eds.), NIPS (Vol. 12, pp. 1064–1070). Cambridge: MIT Press.
Google Scholar

Download references

Acknowledgements

The authors thank D. Hsu for the discussion and cluster computing usage, L.P. Kaelbling and J.J. Leonard for the discussion, and S. Ong for reading the early drafts of this paper. This work is funded by the Singapore NRF through SMART, CENSAM.

Author information

Authors and Affiliations

School of Information Technology & Electrical Engineering, University of Queensland, Queensland, Australia
Hanna Kurniawati
Future Urban Mobility, Singapore–MIT Alliance for Research and Technology, Singapore, Republic of Singapore
Tirthankar Bandyopadhyay
Department of Mechanical Engineering—Center for Ocean Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Nicholas M. Patrikalakis

Authors

Hanna Kurniawati
View author publications
You can also search for this author in PubMed Google Scholar
Tirthankar Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas M. Patrikalakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanna Kurniawati.

Additional information

Most of the work was done while H. Kurniawati was with the Center for Environmental Sensing and Modeling, Singapore–MIT Alliance for Research and Technology.

Appendix

To prove Theorem 1 and Theorem 2, we use α-function as the policy representation. The policy π is represented by a set of α-functions Γ, where π(b)=argmax_α∈Γ∫_s∈S α(s)⋅b(s) ds. Each α-function corresponds to a policy tree T _α. Each node in T _α corresponds to an action and each edge corresponds to an observation. The value α(s) is the expected total reward of executing T _α from s. Let a ₀ denotes the root of T _α and let’s use the same notation to denote a node and its corresponding action. Then, executing T _α starting from s means that the robot at state s starts execution by performing a ₀. An arc from a ₀ to a node at the next level of T _α is followed, based on the observation perceived. Suppose the arc points to node a ₁, then at the next step, the robot performs a ₁. This process is repeated until a leaf node is reached. The value α(s) can be written as,

(7)

where $\alpha_{a_{0}o}$ is the α-function that corresponds to the sub-tree of T _α whose root is the child of a ₀ via edge o.

1.1 A.1 Proof of Theorem 1

To prove the theorem, we first show that for any α-vector, α(s)=α(g(s)). For this purpose, we show that for any function f:S→S that does not change the robot’s relative configuration with respect to the environment, α(s)=α(f(s)). Since g is an instance of such function, α(s)=α(g(s)), too.

We prove α(s)=α(f(s)) by induction on the levels of T _α. When T _α has only one level, α(s)=R(s,a ₀). Since the reward function depends only on the relative configuration of the robot and since applying f to s does not change this relative configuration, α(s)=R(s,a ₀)=R(f(s),a ₀)=α(f(s)). Assume that for any f, any α, and any s∈S, α(s)=α(f(s)) when T _α has i levels. Now, we show that α(s)=α(f(s)) when T _α has (i+1) levels. The key is to show that applying f to s does not change the integration term in (7). Let’s first look at the transition function. Based on property-1 of LS-continuous, we have $T(s, a_{0}, s_{1}) = T(f(s), a_{0}, s_{1}')$ where $s_{1}' = s_{1} + (f(s)-s)$. Since, the displacement vector (f(s)−s) does not change the relative robot’s configuration with respect to the environment, the relative robot’s configuration at s ₁ is the same as that at $s_{1}'$. Since Z depends only on the relative robot’s configuration, $Z(s_{1}, a_{0}, o) = Z(s_{1}', a_{0}, o)$ for any o∈O. Using the result from level-i, $\alpha(s_{1}) = \alpha(s_{1}')$. Hence, the integration term of (7) for α(s) and α(f(s)) are the same. This result and the fact that the reward function depends only on the robot’s relative configuration, gives us α(s)=α(f(s)). Now, we prove that for any policy π, V _π(b)=V _π(Transform(b)). Let $\mathcal{R}$ be a partition of S, such that each set $R_{s} \in\mathcal{R}$ consists of all states in S where the robot’s relative configurations with respect to the environment, is the same as that of s. This means that each state in the same set of $\mathcal {R}$ has the same α-value. And hence we can write,

(8)

From the definition of Transform in (3),

$$(\mathit{Transform}(b))(s) = \int_{s' \in R_s} b(s'). $$

Therefore, (8) can be rewritten as

which is the result we want.

1.2 A.2 Proof of Theorem 2

To prove Theorem 2, we first need the following lemma.

Lemma 1

In an LS-continuous POMDP with parameter (K _RS,K _Z) and normalized observation space, for any α-function and any state s,s′∈S, $|\alpha(s)-\alpha(s')| \leq (\frac{K_{RS}}{1-\gamma} + \frac{\gamma K_{Z} R_{\mathit{max}}}{(1-\gamma )^{2}} )D_{S}(s, s')$.

Proof

Using the definition of α value in (7) and the triangle inequality, we have

(9)

Based on property-3 of LS-continuous, we can bound the first absolute term on the right hand side of (9) as |R(s,a ₀)−R(s′,a ₀)|≤K _RS D _S(s,s′).

Now, we bound the second absolute term on the right hand side of (9). Let d=s′−s. Property-1 of LS-continuous gives us T(s,a ₀,s ₁)=T(s′,a ₀,s ₁+d). Hence, we can rewrite the last absolute term in (9) as,

Using property-2 of LS-continuous, we have Z(s ₁+d,a ₀,o)≥Z(s ₁,a ₀,o)−K _Z⋅D _S(s ₁,s ₁+d).

Substituting the above bounds to (9) gives

The last inequality holds, after $\alpha_{a_{0}o}$ is expanded recursively and assuming that O is normalized. □

Now, we proof Theorem 2. Let V _π(b)=α⋅b and V _π(b′)=α′⋅b′. Then, there must always be a point b _c=ab+(1−a)b′ such that α⋅b _c=α′⋅b _c, as α⋅b≥α′⋅b and α′⋅b′≥α⋅b′

(10)

Suppose f is the joint density function used in computing W _D(b,b _c) with b(s)=∫_s′∈S f(s,s′) ds′ and b _c(s′)=∫_s∈S f(s,s′) ds. And suppose g is the joint density function used in computing W _D(b _c,b′) with b _c(s)=∫_s′∈S g(s,s′) ds′ and b′(s′)=∫_s∈S g(s,s′) ds. Then, we can rewrite (10) as,

Substituting the difference between α values in the above inequality with the result of Lemma 1, and using the definition of Wasserstein distance give us,

Using the convexity property of W _D, we get the desired result.

1.3 A.3 Proof of Theorem 3

To proof Theorem 3, we first need the following lemma that bounds the error generated by a single backup operation.

Lemma 2

Consider a POMDP that satisfies LS-continuous with parameter (K _RS,K _Z) and LA-continuous with parameter (K _RA,h). Suppose the sampling dispersion in each element of $\mathcal{P}$ is ≤δ _A. Then, the error generated by a single simplified GCS backup at a belief b is bounded as, $|H V(b)-\hat {H_{b}} V(b) | \leq K_{RA} \delta_{A} + \gamma (\frac{K_{RS}}{1-\gamma}+\frac {K_{Z} R_{\mathit{max}}}{(1-\gamma)^{2}} )h ({\delta_{A}})$.

Proof

To shorten the proof writing, let’s use the Q-value notation. For any b∈B and any a∈A,

(11)

The single backup error is then,

(12)

where $\operatorname{Samp}(P)$ is the sampled representation of $P \in\mathcal{P} $,

Let’s compute $Q(b, a_{P}^{*})-Q(b, \hat{a}_{P}^{*})$ for an element P of $\mathcal{P}$. For writing compactness, we drop the P subscript. Using (11) and triangle inequality, we get

(13)

Using property-3 of LA-continuous, we bound the first term in the right hand side as ∫_s∈S|R(s,a ^∗)−R(s,a)|b(s) ds≤K _RA δ _A. Using property-1 of LA-continuous, $T(s, a^{*}, s') = T(s, \hat{a}^{*}, s'+f(a^{*}, \hat {a}^{*}))$. Therefore, (13) can be rewritten as,

Since a ^∗ and $\hat{a}^{*}$ belong to the same element of $\mathcal{P}$, using property-2 of LA-continuous, we get $Z(s'+f(a^{*}, \hat{a}^{*}),\allowbreak \hat{a}^{*}, o) = Z(s'+f(a^{*}, \hat{a}^{*}), a^{*}, o)$. Using property-2 of LS-continuous, we get $Z(s'+f(a^{*}, \hat {a}^{*}), a^{*}, o) \geq Z(s', a^{*}, o) - {K_{Z}} {D_{S}(s', s'+f(a^{*}, \hat{a}^{*}))} $. Using these properties and the assumption that O is normalized, rearranging the above inequality gives us

The last inequality holds based on three properties: (1) any α value does not exceed $\frac{{R_{\mathit{max}}}}{1-\gamma }$, (2) property-1 of LA-continuous, i.e., ${D_{S}(s', s'+f(a^{*}, \hat {a}^{*}))} \leq h({D_{P}(a^{*}, \hat{a}^{*})})$, and (3) Lemma 1.

Using the above inequality and the fact that for any ${P} \in\mathcal{P} $, $D_{P}(a_{P}^{*}, \hat{a}_{P}^{*}) \leq{\delta_{A}} $, and h is an increasing function, $\vert H {V(b)}-{\hat{H_{b}}} {V(b)} \vert \leq K_{RA} \delta_{A} + \gamma ( \frac{K_{RS}}{1-\gamma} + \frac{ K_{Z} R_{\mathit{max}}}{(1-\gamma)^{2}} ) h(\delta_{A})$. □

Now, we can prove Theorem 3. The difference between the optimal value V ^∗ and the value V _t computed by the simplified GCS after t steps are,

Applying Theorem 2 to V ^∗ and V _t, and bounding |V ^∗(b′)−V _t(b′)|≤ε _t, we get

(14)

To compute ε _t, notice that V ^∗(b′)=HV ^∗(b′) and $V_{t}(b') \leq\hat{H_{b}} V_{t-1}(b')$. Hence, $\vert V^{*}(b') - V_{t}(b')\vert \leq \vert HV^{*}(b')- \hat{H_{b}} V_{t-1}(b')\vert $ and the following holds

(15)

Using the contraction property of H and (14), we can bound the first absolute term on the right hand side of (15) as $\vert HV^{*}(b')-HV_{t-1}(b')\vert \leq\gamma (4 (\frac {{K_{RS}}}{1-\gamma}+\frac{K_{Z} R_{\mathit{max}}}{(1-\gamma)^{2}} )\delta_{B} + \varepsilon _{t-1} )$. The last absolute term of (15) can be bounded using Lemma 2. As a result, we get

Expanding the recursion gives us,

which is the result we want.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kurniawati, H., Bandyopadhyay, T. & Patrikalakis, N.M. Global motion planning under uncertain motion, sensing, and environment map. Auton Robot 33, 255–272 (2012). https://doi.org/10.1007/s10514-012-9307-y

Download citation

Received: 30 September 2011
Accepted: 08 June 2012
Published: 23 June 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10514-012-9307-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global motion planning under uncertain motion, sensing, and environment map

Abstract

Access this article

Similar content being viewed by others

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

An Online POMDP Solver for Uncertainty Planning in Dynamic Environment

Importance Sampling for Online Planning under Uncertainty

Notes

References

Acknowledgements