Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback

  • Ashesh Jain
  • Shikhar Sharma
  • Ashutosh Saxena
Part of the Springer Tracts in Advanced Robotics book series (STAR, volume 114)


We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than those arising from simple geometric constraints on robot’s trajectory, such as distance of the robot from human etc. Our preferences are rather governed by the surrounding context of various objects and human interactions in the environment. Such preferences makes the problem challenging because the criterion of defining a good trajectory now varies with the task, with the environment and across the users. Furthermore, demonstrating optimal trajectories (e.g., learning from expert’s demonstrations) is often challenging and non-intuitive on high degrees of freedom manipulators. In this work, we propose an approach that requires a non-expert user to only incrementally improve the trajectory currently proposed by the robot. We implement our algorithm on two high degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings—household chores and grocery store checkout—and show that users are able to train the robot with just a few feedbacks (taking only a few minutes). Despite receiving sub-optimal feedback from non-expert users, our algorithm enjoys theoretical bounds on regret that match the asymptotic rates of optimal trajectory algorithms.



This research was supported by ARO, Microsoft Faculty fellowship and NSF Career award (to Saxena).


  1. 1.
    Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR 29(13) (2010)Google Scholar
  2. 2.
    Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. IJSR 4(4), 343–355 (2012)Google Scholar
  3. 3.
    Alterovitz, R., Siméon, T., Goldberg, K.: The stochastic motion roadmap: A sampling framework for planning with markov motion uncertainty. In: RSS (2007)Google Scholar
  4. 4.
    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Autonom. Syst. 57(5), 469–483 (2009)CrossRefGoogle Scholar
  5. 5.
    Berenson, D., Abbeel, P., Goldberg, K.: A robot path planning framework that learns from experience. In: ICRA (2012)Google Scholar
  6. 6.
    Berg, J.V.D., Abbeel, P., Goldberg, K.: LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information. In: RSS (2010)Google Scholar
  7. 7.
    Bhattacharya, S., Likhachev, M., Kumar, V.: Identification and representation of homotopy classes of trajectories for search-based path planning in 3d. In: RSS (2011)Google Scholar
  8. 8.
    Bischoff, R., Kazi, A., Seyfarth, M.: The morpha style guide for icon-based programming. In: Proceedings of the 11th IEEE International Workshop on RHIC (2002)Google Scholar
  9. 9.
    Calinon, S., Guenter, F., Billard, A.: On learning, representing, and generalizing a task in a humanoid robot. In: IEEE Transactions on Systems Man and Cybernetics (2007)Google Scholar
  10. 10.
    Cohen, B.J., Chitta, S., Likhachev, M.: Search-based planning for manipulation with motion primitives. In: ICRA (2010)Google Scholar
  11. 11.
    Dey, D., Liu, T.Y., Hebert, M., Bagnell, J.A.: Contextual sequence prediction with application to control library optimization. In: RSS (2012)Google Scholar
  12. 12.
    Diankov, R.: Automated Construction of Robotic Manipulation Programs. Ph.D. thesis, CMU, RI (2010)Google Scholar
  13. 13.
    Dragan, A., Srinivasa, S.: Generating legible motion. In: RSS (2013)Google Scholar
  14. 14.
    Dragan, A., Lee, K., Srinivasa, S.: Legibility and predictability of robot motion. In: HRI (2013)Google Scholar
  15. 15.
    Erickson, L.H., LaValle, S.M.: Survivability: Measuring and ensuring path diversity. In: ICRA (2009)Google Scholar
  16. 16.
    Gossow, D., Leeperand, A., Hershberger, D., Ciocarlie, M.: Interactive markers: 3-d user interfaces for ros applications [ros topics]. IEEE Robot. Autom. Mag. 18(4), 14–15 (2011)CrossRefGoogle Scholar
  17. 17.
    Green, C.J., Kelly, A.: Toward optimal sampling in the space of paths. In: ISRR (2007)Google Scholar
  18. 18.
    Hovland, G.E., Sikka, P., McCarragher, B.J.: Skill acquisition from human demonstration using a hidden markov model. In: ICRA (1996)Google Scholar
  19. 19.
    Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: NIPS (2013)Google Scholar
  20. 20.
    Jiang, Y., Lim, M., Zheng, C., Saxena, A.: Learning to place new objects in a scene. IJRR, 31(9) (2012)Google Scholar
  21. 21.
    Joachims, T.: Training linear svms in linear time. In: KDD (2006)Google Scholar
  22. 22.
    Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural SVMS. Mach Learn, 77(1) (2009)Google Scholar
  23. 23.
    Karaman, S., Frazzoli, E.: Incremental sampling-based algorithms for optimal motion planning. In: RSS (2010)Google Scholar
  24. 24.
    Klingbeil, E., Rao, D., Carpenter, B., Ganapathi, V., Ng, A.Y., Khatib, O.: Grasping with application to an autonomous checkout robot. In: ICRA (2011)Google Scholar
  25. 25.
    Kober, J., Peters, J.: Policy search for motor primitives in robotics. Machine Learning, 84(1) (2011)Google Scholar
  26. 26.
    Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)Google Scholar
  27. 27.
    LaValle, S.M., Kuffner, J.J.: Randomized kinodynamic planning. IJRR 20(5), 378–400 (2001)Google Scholar
  28. 28.
    Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013)Google Scholar
  29. 29.
    Levine, S., Koltun, V.: Continuous inverse optimal control with locally optimal examples. In: ICML (2012)Google Scholar
  30. 30.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1, Cambridge University Press, Cambridge (2008)Google Scholar
  31. 31.
    Nicolescu, M.N., Mataric, M.J.: Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (2003)Google Scholar
  32. 32.
    Nikolaidis, S., Shah, J.: Human-robot teaming using shared mental models. In: HRI, Workshop on Human-Agent-Robot Teamwork (2012)Google Scholar
  33. 33.
    Nikolaidis, S., Shah, J.: Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In: IEEE/ACM ICHRI (2013)Google Scholar
  34. 34.
    Phillips, M., Cohen, B., Chitta, S., Likhachev, M.: E-graphs: Bootstrapping planning with experience graphs. In: RSS (2012)Google Scholar
  35. 35.
    Raman, K., Joachims, T.: Learning socially optimal information systems from egoistic users. In: Proceedings of the ECML (2013)Google Scholar
  36. 36.
    Ratliff, N.: Learning to search: structured prediction techniques for imitation learning. Ph.D. thesis, CMU, RI (2009)Google Scholar
  37. 37.
    Ratliff, N., Bagnell, J.A., Zinkevich, M.: Maximum margin planning. In: ICML (2006)Google Scholar
  38. 38.
    Ratliff, N., Silver, D., Bagnell, J.A.: Learning to search: Functional gradient techniques for imitation learning. Autonom. Robot. 27(1), 25–53 (2009a)CrossRefGoogle Scholar
  39. 39.
    Ratliff, N., Zucker, M., Bagnell, J.A., Srinivasa, S.: Chomp: Gradient optimization techniques for efficient motion planning. In: ICRA (2009b)Google Scholar
  40. 40.
    Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. IJRR, 27(2) (2008)Google Scholar
  41. 41.
    Shivaswamy, P., Joachims, T.: Online structured prediction via coactive learning. In: ICML (2012)Google Scholar
  42. 42.
    Shneiderman, B., Plaisant, C.: Designing The User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Publication (2010)Google Scholar
  43. 43.
    Stopp, A., Horstmann, S., Kristensen, S., Lohnert, F.: Towards interactive learning for manufacturing assistants. In: Proceedings of the 10th IEEE International Workshop on RHIC (2001)Google Scholar
  44. 44.
    Sucan, I.A., Moll, M., Kavraki, L.E.: The Open Motion Planning Library. IEEE Robot. Autom. Mag. 19(4):72–82 (2012).
  45. 45.
    Tamane, K., Revfi, M., Asfour, T.: Synthesizing object receiving motions of humanoid robots with human motion database. In: ICRA (2013)Google Scholar
  46. 46.
    Vernaza, P., Bagnell, J.A.: Efficient high dimensional maximum entropy modeling via symmetric partition functions. In: NIPS (2012)Google Scholar
  47. 47.
    Wilson, A., Fern, A., Tadepalli, P.: A bayesian approach for policy learning from trajectory preference queries. In: NIPS (2012)Google Scholar
  48. 48.
    Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Cornell UniversityDepartment of Computer ScienceIthacaUSA
  2. 2.Indian Institute of TechnologyKanpurIndia

Personalised recommendations