Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback

Jain, Ashesh; Sharma, Shikhar; Saxena, Ashutosh

doi:10.1007/978-3-319-28872-7_19

Ashesh Jain⁵,
Shikhar Sharma⁶ &
Ashutosh Saxena⁵

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 114))

5053 Accesses

Abstract

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than those arising from simple geometric constraints on robot’s trajectory, such as distance of the robot from human etc. Our preferences are rather governed by the surrounding context of various objects and human interactions in the environment. Such preferences makes the problem challenging because the criterion of defining a good trajectory now varies with the task, with the environment and across the users. Furthermore, demonstrating optimal trajectories (e.g., learning from expert’s demonstrations) is often challenging and non-intuitive on high degrees of freedom manipulators. In this work, we propose an approach that requires a non-expert user to only incrementally improve the trajectory currently proposed by the robot. We implement our algorithm on two high degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings—household chores and grocery store checkout—and show that users are able to train the robot with just a few feedbacks (taking only a few minutes). Despite receiving sub-optimal feedback from non-expert users, our algorithm enjoys theoretical bounds on regret that match the asymptotic rates of optimal trajectory algorithms.

This work was done when Sharma was an intern at the Cornell University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A kitchen knife originating in Japan.
2.
When RRT becomes too slow, we switch to a more efficient bidirectional-RRT.The cost function (or its approximation) we learn can be fed to trajectory optimizers like CHOMP [39] or optimal planners like RRT* [23] to produce reasonably good trajectories.
3.
Consider the following analogy. In search engine results, it is much harder for the user to provide the best web-pages for each query, but it is easier to provide relative ranking on the search results by clicking.
4.
Similar results were obtained with nDCG@1 metric, not included here due to space constraints.
5.
The smaller user size on PR2 is because it requires users with experience in Rviz-ROS. Further, we also observed users found it harder to correct trajectory waypoints in a simulator than providing zero-G feedback on the robot. For the same reason we report training time only on Baxter for grocery store setting.

References

Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR 29(13) (2010)
Google Scholar
Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. IJSR 4(4), 343–355 (2012)
Google Scholar
Alterovitz, R., Siméon, T., Goldberg, K.: The stochastic motion roadmap: A sampling framework for planning with markov motion uncertainty. In: RSS (2007)
Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Autonom. Syst. 57(5), 469–483 (2009)
Article Google Scholar
Berenson, D., Abbeel, P., Goldberg, K.: A robot path planning framework that learns from experience. In: ICRA (2012)
Google Scholar
Berg, J.V.D., Abbeel, P., Goldberg, K.: LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information. In: RSS (2010)
Google Scholar
Bhattacharya, S., Likhachev, M., Kumar, V.: Identification and representation of homotopy classes of trajectories for search-based path planning in 3d. In: RSS (2011)
Google Scholar
Bischoff, R., Kazi, A., Seyfarth, M.: The morpha style guide for icon-based programming. In: Proceedings of the 11th IEEE International Workshop on RHIC (2002)
Google Scholar
Calinon, S., Guenter, F., Billard, A.: On learning, representing, and generalizing a task in a humanoid robot. In: IEEE Transactions on Systems Man and Cybernetics (2007)
Google Scholar
Cohen, B.J., Chitta, S., Likhachev, M.: Search-based planning for manipulation with motion primitives. In: ICRA (2010)
Google Scholar
Dey, D., Liu, T.Y., Hebert, M., Bagnell, J.A.: Contextual sequence prediction with application to control library optimization. In: RSS (2012)
Google Scholar
Diankov, R.: Automated Construction of Robotic Manipulation Programs. Ph.D. thesis, CMU, RI (2010)
Google Scholar
Dragan, A., Srinivasa, S.: Generating legible motion. In: RSS (2013)
Google Scholar
Dragan, A., Lee, K., Srinivasa, S.: Legibility and predictability of robot motion. In: HRI (2013)
Google Scholar
Erickson, L.H., LaValle, S.M.: Survivability: Measuring and ensuring path diversity. In: ICRA (2009)
Google Scholar
Gossow, D., Leeperand, A., Hershberger, D., Ciocarlie, M.: Interactive markers: 3-d user interfaces for ros applications [ros topics]. IEEE Robot. Autom. Mag. 18(4), 14–15 (2011)
Article Google Scholar
Green, C.J., Kelly, A.: Toward optimal sampling in the space of paths. In: ISRR (2007)
Google Scholar
Hovland, G.E., Sikka, P., McCarragher, B.J.: Skill acquisition from human demonstration using a hidden markov model. In: ICRA (1996)
Google Scholar
Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: NIPS (2013)
Google Scholar
Jiang, Y., Lim, M., Zheng, C., Saxena, A.: Learning to place new objects in a scene. IJRR, 31(9) (2012)
Google Scholar
Joachims, T.: Training linear svms in linear time. In: KDD (2006)
Google Scholar
Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural SVMS. Mach Learn, 77(1) (2009)
Google Scholar
Karaman, S., Frazzoli, E.: Incremental sampling-based algorithms for optimal motion planning. In: RSS (2010)
Google Scholar
Klingbeil, E., Rao, D., Carpenter, B., Ganapathi, V., Ng, A.Y., Khatib, O.: Grasping with application to an autonomous checkout robot. In: ICRA (2011)
Google Scholar
Kober, J., Peters, J.: Policy search for motor primitives in robotics. Machine Learning, 84(1) (2011)
Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)
Google Scholar
LaValle, S.M., Kuffner, J.J.: Randomized kinodynamic planning. IJRR 20(5), 378–400 (2001)
Google Scholar
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013)
Google Scholar
Levine, S., Koltun, V.: Continuous inverse optimal control with locally optimal examples. In: ICML (2012)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1, Cambridge University Press, Cambridge (2008)
Google Scholar
Nicolescu, M.N., Mataric, M.J.: Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (2003)
Google Scholar
Nikolaidis, S., Shah, J.: Human-robot teaming using shared mental models. In: HRI, Workshop on Human-Agent-Robot Teamwork (2012)
Google Scholar
Nikolaidis, S., Shah, J.: Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In: IEEE/ACM ICHRI (2013)
Google Scholar
Phillips, M., Cohen, B., Chitta, S., Likhachev, M.: E-graphs: Bootstrapping planning with experience graphs. In: RSS (2012)
Google Scholar
Raman, K., Joachims, T.: Learning socially optimal information systems from egoistic users. In: Proceedings of the ECML (2013)
Google Scholar
Ratliff, N.: Learning to search: structured prediction techniques for imitation learning. Ph.D. thesis, CMU, RI (2009)
Google Scholar
Ratliff, N., Bagnell, J.A., Zinkevich, M.: Maximum margin planning. In: ICML (2006)
Google Scholar
Ratliff, N., Silver, D., Bagnell, J.A.: Learning to search: Functional gradient techniques for imitation learning. Autonom. Robot. 27(1), 25–53 (2009a)
Article Google Scholar
Ratliff, N., Zucker, M., Bagnell, J.A., Srinivasa, S.: Chomp: Gradient optimization techniques for efficient motion planning. In: ICRA (2009b)
Google Scholar
Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. IJRR, 27(2) (2008)
Google Scholar
Shivaswamy, P., Joachims, T.: Online structured prediction via coactive learning. In: ICML (2012)
Google Scholar
Shneiderman, B., Plaisant, C.: Designing The User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Publication (2010)
Google Scholar
Stopp, A., Horstmann, S., Kristensen, S., Lohnert, F.: Towards interactive learning for manufacturing assistants. In: Proceedings of the 10th IEEE International Workshop on RHIC (2001)
Google Scholar
Sucan, I.A., Moll, M., Kavraki, L.E.: The Open Motion Planning Library. IEEE Robot. Autom. Mag. 19(4):72–82 (2012). http://ompl.kavrakilab.org
Google Scholar
Tamane, K., Revfi, M., Asfour, T.: Synthesizing object receiving motions of humanoid robots with human motion database. In: ICRA (2013)
Google Scholar
Vernaza, P., Bagnell, J.A.: Efficient high dimensional maximum entropy modeling via symmetric partition functions. In: NIPS (2012)
Google Scholar
Wilson, A., Fern, A., Tadepalli, P.: A bayesian approach for policy learning from trajectory preference queries. In: NIPS (2012)
Google Scholar
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Google Scholar

Download references

Acknowledgments

This research was supported by ARO, Microsoft Faculty fellowship and NSF Career award (to Saxena).

Author information

Authors and Affiliations

Cornell University, Department of Computer Science, Ithaca, NY, USA
Ashesh Jain & Ashutosh Saxena
Indian Institute of Technology, Kanpur, India
Shikhar Sharma

Authors

Ashesh Jain
View author publications
You can also search for this author in PubMed Google Scholar
Shikhar Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Saxena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashesh Jain .

Editor information

Editors and Affiliations

Creative Informatics, The University of Tokyo, Tokyo, Japan
Masayuki Inaba
School of Electrical Engineering and Com, Queensland Univ of Technology, Brisbane, Queensland, Australia
Peter Corke

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jain, A., Sharma, S., Saxena, A. (2016). Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback. In: Inaba, M., Corke, P. (eds) Robotics Research. Springer Tracts in Advanced Robotics, vol 114. Springer, Cham. https://doi.org/10.1007/978-3-319-28872-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-28872-7_19
Published: 23 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28870-3
Online ISBN: 978-3-319-28872-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics