Skip to main content
Log in

Global motion planning under uncertain motion, sensing, and environment map

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Uncertainty in motion planning is often caused by three main sources: motion error, sensing error, and imperfect environment map. Despite the significant effect of all three sources of uncertainty to motion planning problems, most planners take into account only one or at most two of them. We propose a new motion planner, called Guided Cluster Sampling (GCS), that takes into account all three sources of uncertainty for robots with active sensing capabilities. GCS uses the Partially Observable Markov Decision Process (POMDP) framework and the point-based POMDP approach. Although point-based POMDPs have shown impressive progress over the past few years, it performs poorly when the environment map is imperfect. This poor performance is due to the extremely high dimensional state space, which translates to the extremely large belief space B.

We alleviate this problem by constructing a more suitable sampling distribution based on the observations that when the robot has active sensing capability, B can be partitioned into a collection of much smaller sub-spaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. Utilizing these observations, GCS samples B in two-stages, a subspace is sampled from the collection and then a belief is sampled from the subspace.

It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Simulation results on marine robotics scenarios suggest that GCS can generate reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDPs today. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, given enough time, GCS converges to the optimal policy.

To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The proofs of all theorems are in Appendix.

  2. See footnote 1.

  3. See footnote 1.

References

  • Alterovitz, R., Simeon, T., & Goldberg, K. (2007). The stochastic motion roadmap: a sampling framework for planning with Markov motion uncertainty. In RSS.

    Google Scholar 

  • Bai, H., Hsu, D., Lee, W. S., & Ngo, A. V. (2010). Monte Carlo Value Iteration for continuous-state POMDPs. In WAFR.

    Google Scholar 

  • Bandyopadhyay, T., Sarcione, L., & Hover, F. (2009). A simple reactive obstacle avoidance algorithm and its application in Singapore Harbour. In FSR.

    Google Scholar 

  • Berg, J. V. D., Abbeel, P., & Goldberg, K. (2010). LQG-MP: optimized path planning for robots with motion uncertainty and imperfect state information. In RSS.

    Google Scholar 

  • Burns, B., & Brock, O. (2007). Sampling-based motion planning with sensing uncertainty. In ICRA.

    Google Scholar 

  • Choset, H., Lynch, K. M., Hutchinson, S., Kantor, G., Burgard, W., Kavraki, L. E., & Thrun, S. (2005). Principles of robot motion: Theory, algorithms, and implementations. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Guez, A., & Pineau, J. (2010). Multi-tasking SLAM. In ICRA.

    Google Scholar 

  • Guibas, L., Hsu, D., Kurniawati, H., & Rehman, E. (2008). Bounded uncertainty roadmaps for path planning. In WAFR.

    Google Scholar 

  • Hauser, K. (2010). Randomized belief-space replanning in partially-observable continuous spaces. In WAFR.

    Google Scholar 

  • Hsiao, K., Kaelbling, L. P., & Lozano-Perez, T. (2007). Grasping POMDPs. In ICRA (pp. 4685–4692).

    Google Scholar 

  • Hsu, D., Latombe, J.-C., & Motwani, R. (1999). Path planning in expansive configuration spaces. International Journal of Computational Geometry and Applications, 9(4–5), 495–512.

    MathSciNet  Google Scholar 

  • Hsu, D., Lee, W. S., & Rong, N. (2007). What makes some POMDP problems easy to approximate? In NIPS.

    Google Scholar 

  • Kollar, T., & Roy, N. (2008). Efficient optimization of information-theoretic exploration in SLAM. In AAAI (pp. 1369–1375).

    Google Scholar 

  • Kurniawati, H., Du, Y., Hsu, D., & Lee, W. S. (2011). Motion planning under uncertainty for robotic tasks with long time horizons. The International Journal of Robotics Research, 30(3), 308–323.

    Article  Google Scholar 

  • Kurniawati, H., Hsu, D., & Lee, W. S. (2008). SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In RSS.

    Google Scholar 

  • Missiuro, P., & Roy, N. (2006). Adapting probabilistic roadmaps to handle uncertain maps. In ICRA.

    Google Scholar 

  • Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.

    Article  MathSciNet  MATH  Google Scholar 

  • Papadopoulos, G., Kurniawati, H., Shariff, A. S. B. M., Wong, L. J., & Patrikalakis, N. M. (2011). 3D-surface reconstruction for partially submerged marine structures using an autonomous surface vehicle. In IROS.

    Google Scholar 

  • Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based Value Iteration: an anytime algorithm for POMDPs. In IJCAI (pp. 1025–1032).

    Google Scholar 

  • Plaku, E., Bekris, K., Chen, B. Y., Ladd, A. M., & Kavraki, L. E. (2005). Sampling-based roadmap of trees for parallel motion planning. IEEE Transactions on Robotics, 21(4), 597–608.

    Article  Google Scholar 

  • Porta, J. M., Vlassis, N., Spaan, M. T. J., & Poupart, P. (2006). Point-Based Value Iteration for continuous POMDPs. Journal of Machine Learning Research, 7(Nov), 2329–2367.

    MathSciNet  MATH  Google Scholar 

  • Prentice, S., & Roy, N. (2007). The Belief Roadmap: Efficient planning in linear POMDPs by factoring the covariance. In ISRR.

    Google Scholar 

  • Smith, T., & Simmons, R. (2005). Point-based POMDP algorithms: Improved analysis and implementation. In UAI.

    Google Scholar 

  • Stachniss, C., Grisetti, G., & Burgard, W. (2005). Information gain-based exploration using Rao-Blackwellized particle filters. In RSS (pp. 65–72).

    Google Scholar 

  • Thrun, S. (2000). Monte Carlo POMDPs. In S. A. Solla, T. K. Leen & K.-R. Müller (Eds.), NIPS (Vol. 12, pp. 1064–1070). Cambridge: MIT Press.

    Google Scholar 

Download references

Acknowledgements

The authors thank D. Hsu for the discussion and cluster computing usage, L.P. Kaelbling and J.J. Leonard for the discussion, and S. Ong for reading the early drafts of this paper. This work is funded by the Singapore NRF through SMART, CENSAM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanna Kurniawati.

Additional information

Most of the work was done while H. Kurniawati was with the Center for Environmental Sensing and Modeling, Singapore–MIT Alliance for Research and Technology.

Appendix

Appendix

To prove Theorem 1 and Theorem 2, we use α-function as the policy representation. The policy π is represented by a set of α-functions Γ, where π(b)=argmax αΓ sS α(s)⋅b(s) ds. Each α-function corresponds to a policy tree T α . Each node in T α corresponds to an action and each edge corresponds to an observation. The value α(s) is the expected total reward of executing T α from s. Let a 0 denotes the root of T α and let’s use the same notation to denote a node and its corresponding action. Then, executing T α starting from s means that the robot at state s starts execution by performing a 0. An arc from a 0 to a node at the next level of T α is followed, based on the observation perceived. Suppose the arc points to node a 1, then at the next step, the robot performs a 1. This process is repeated until a leaf node is reached. The value α(s) can be written as,

(7)

where \(\alpha_{a_{0}o}\) is the α-function that corresponds to the sub-tree of T α whose root is the child of a 0 via edge o.

1.1 A.1 Proof of Theorem 1

To prove the theorem, we first show that for any α-vector, α(s)=α(g(s)). For this purpose, we show that for any function f:SS that does not change the robot’s relative configuration with respect to the environment, α(s)=α(f(s)). Since g is an instance of such function, α(s)=α(g(s)), too.

We prove α(s)=α(f(s)) by induction on the levels of T α . When T α has only one level, α(s)=R(s,a 0). Since the reward function depends only on the relative configuration of the robot and since applying f to s does not change this relative configuration, α(s)=R(s,a 0)=R(f(s),a 0)=α(f(s)). Assume that for any f, any α, and any sS, α(s)=α(f(s)) when T α has i levels. Now, we show that α(s)=α(f(s)) when T α has (i+1) levels. The key is to show that applying f to s does not change the integration term in (7). Let’s first look at the transition function. Based on property-1 of LS-continuous, we have \(T(s, a_{0}, s_{1}) = T(f(s), a_{0}, s_{1}')\) where \(s_{1}' = s_{1} + (f(s)-s)\). Since, the displacement vector (f(s)−s) does not change the relative robot’s configuration with respect to the environment, the relative robot’s configuration at s 1 is the same as that at \(s_{1}'\). Since Z depends only on the relative robot’s configuration, \(Z(s_{1}, a_{0}, o) = Z(s_{1}', a_{0}, o)\) for any oO. Using the result from level-i, \(\alpha(s_{1}) = \alpha(s_{1}')\). Hence, the integration term of (7) for α(s) and α(f(s)) are the same. This result and the fact that the reward function depends only on the robot’s relative configuration, gives us α(s)=α(f(s)). Now, we prove that for any policy π, V π (b)=V π (Transform(b)). Let \(\mathcal{R}\) be a partition of S, such that each set \(R_{s} \in\mathcal{R}\) consists of all states in S where the robot’s relative configurations with respect to the environment, is the same as that of s. This means that each state in the same set of \(\mathcal {R}\) has the same α-value. And hence we can write,

(8)

From the definition of Transform in (3),

$$(\mathit{Transform}(b))(s) = \int_{s' \in R_s} b(s'). $$

Therefore, (8) can be rewritten as

which is the result we want.

1.2 A.2 Proof of Theorem 2

To prove Theorem 2, we first need the following lemma.

Lemma 1

In an LS-continuous POMDP with parameter (K RS ,K Z ) and normalized observation space, for any α-function and any state s,s′∈S, \(|\alpha(s)-\alpha(s')| \leq (\frac{K_{RS}}{1-\gamma} + \frac{\gamma K_{Z} R_{\mathit{max}}}{(1-\gamma )^{2}} )D_{S}(s, s')\).

Proof

Using the definition of α value in (7) and the triangle inequality, we have

(9)

Based on property-3 of LS-continuous, we can bound the first absolute term on the right hand side of (9) as |R(s,a 0)−R(s′,a 0)|≤K RS D S (s,s′).

Now, we bound the second absolute term on the right hand side of (9). Let d=s′−s. Property-1 of LS-continuous gives us T(s,a 0,s 1)=T(s′,a 0,s 1+d). Hence, we can rewrite the last absolute term in (9) as,

Using property-2 of LS-continuous, we have Z(s 1+d,a 0,o)≥Z(s 1,a 0,o)−K Z D S (s 1,s 1+d).

Substituting the above bounds to (9) gives

The last inequality holds, after \(\alpha_{a_{0}o}\) is expanded recursively and assuming that O is normalized. □

Now, we proof Theorem 2. Let V π (b)=αb and V π (b′)=α′⋅b′. Then, there must always be a point b c =ab+(1−a)b′ such that αb c =α′⋅b c , as αbα′⋅b and α′⋅b′≥αb

(10)

Suppose f is the joint density function used in computing W D (b,b c ) with b(s)=∫ s′∈S f(s,s′) ds′ and b c (s′)=∫ sS f(s,s′) ds. And suppose g is the joint density function used in computing W D (b c ,b′) with b c (s)=∫ s′∈S g(s,s′) ds′ and b′(s′)=∫ sS g(s,s′) ds. Then, we can rewrite (10) as,

Substituting the difference between α values in the above inequality with the result of Lemma 1, and using the definition of Wasserstein distance give us,

Using the convexity property of W D , we get the desired result.

1.3 A.3 Proof of Theorem 3

To proof Theorem 3, we first need the following lemma that bounds the error generated by a single backup operation.

Lemma 2

Consider a POMDP that satisfies LS-continuous with parameter (K RS ,K Z ) and LA-continuous with parameter (K RA ,h). Suppose the sampling dispersion in each element of \(\mathcal{P}\) isδ A . Then, the error generated by a single simplified GCS backup at a belief b is bounded as, \(|H V(b)-\hat {H_{b}} V(b) | \leq K_{RA} \delta_{A} + \gamma (\frac{K_{RS}}{1-\gamma}+\frac {K_{Z} R_{\mathit{max}}}{(1-\gamma)^{2}} )h ({\delta_{A}})\).

Proof

To shorten the proof writing, let’s use the Q-value notation. For any bB and any aA,

(11)

The single backup error is then,

(12)

where \(\operatorname{Samp}(P)\) is the sampled representation of \(P \in\mathcal{P} \),

Let’s compute \(Q(b, a_{P}^{*})-Q(b, \hat{a}_{P}^{*})\) for an element P of \(\mathcal{P}\). For writing compactness, we drop the P subscript. Using (11) and triangle inequality, we get

(13)

Using property-3 of LA-continuous, we bound the first term in the right hand side as ∫ sS |R(s,a )−R(s,a)|b(s) dsK RA δ A . Using property-1 of LA-continuous, \(T(s, a^{*}, s') = T(s, \hat{a}^{*}, s'+f(a^{*}, \hat {a}^{*}))\). Therefore, (13) can be rewritten as,

Since a and \(\hat{a}^{*}\) belong to the same element of \(\mathcal{P}\), using property-2 of LA-continuous, we get \(Z(s'+f(a^{*}, \hat{a}^{*}),\allowbreak \hat{a}^{*}, o) = Z(s'+f(a^{*}, \hat{a}^{*}), a^{*}, o)\). Using property-2 of LS-continuous, we get \(Z(s'+f(a^{*}, \hat {a}^{*}), a^{*}, o) \geq Z(s', a^{*}, o) - {K_{Z}} {D_{S}(s', s'+f(a^{*}, \hat{a}^{*}))} \). Using these properties and the assumption that O is normalized, rearranging the above inequality gives us

The last inequality holds based on three properties: (1) any α value does not exceed \(\frac{{R_{\mathit{max}}}}{1-\gamma }\), (2) property-1 of LA-continuous, i.e., \({D_{S}(s', s'+f(a^{*}, \hat {a}^{*}))} \leq h({D_{P}(a^{*}, \hat{a}^{*})})\), and (3) Lemma 1.

Using the above inequality and the fact that for any \({P} \in\mathcal{P} \), \(D_{P}(a_{P}^{*}, \hat{a}_{P}^{*}) \leq{\delta_{A}} \), and h is an increasing function, \(\vert H {V(b)}-{\hat{H_{b}}} {V(b)} \vert \leq K_{RA} \delta_{A} + \gamma ( \frac{K_{RS}}{1-\gamma} + \frac{ K_{Z} R_{\mathit{max}}}{(1-\gamma)^{2}} ) h(\delta_{A})\). □

Now, we can prove Theorem 3. The difference between the optimal value V and the value V t computed by the simplified GCS after t steps are,

Applying Theorem 2 to V and V t , and bounding |V (b′)−V t (b′)|≤ε t , we get

(14)

To compute ε t , notice that V (b′)=HV (b′) and \(V_{t}(b') \leq\hat{H_{b}} V_{t-1}(b')\). Hence, \(\vert V^{*}(b') - V_{t}(b')\vert \leq \vert HV^{*}(b')- \hat{H_{b}} V_{t-1}(b')\vert \) and the following holds

(15)

Using the contraction property of H and (14), we can bound the first absolute term on the right hand side of (15) as \(\vert HV^{*}(b')-HV_{t-1}(b')\vert \leq\gamma (4 (\frac {{K_{RS}}}{1-\gamma}+\frac{K_{Z} R_{\mathit{max}}}{(1-\gamma)^{2}} )\delta_{B} + \varepsilon _{t-1} )\). The last absolute term of (15) can be bounded using Lemma 2. As a result, we get

Expanding the recursion gives us,

which is the result we want.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kurniawati, H., Bandyopadhyay, T. & Patrikalakis, N.M. Global motion planning under uncertain motion, sensing, and environment map. Auton Robot 33, 255–272 (2012). https://doi.org/10.1007/s10514-012-9307-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-012-9307-y

Keywords

Navigation