Learning to search: Functional gradient techniques for imitation learning

Abstract

Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.

While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration.

The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.

This is a preview of subscription content, access via your institution.

References

  1. Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the twenty-first international conference on machine learning.

  2. Anderson, B. D. O., & Moore, J. B. (1990). Optimal control: linear quadratic methods. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  3. Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems.

  4. Atkeson, C., Schaal, S., & Moore, A. (1995). Locally weighted learning. AI Review.

  5. Bain, M., & Sammut, C. (1995). A framework for behavioral cloning. In Machine intelligence agents. London: Oxford University Press.

    Google Scholar 

  6. Boyd, S., Ghaoui, L. E., Feron, E., & Balakrishnan, V. (1994). Linear matrix inequalities in system and control theory. Society for Industrial and Applied Mathematics (SIAM).

  7. Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. In IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation, 37, 286–298.

  8. Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press.

    Google Scholar 

  9. Chestnutt, J., Kuffner, J., Nishiwaki, K., & Kagami, S. (2003). Planning biped navigation strategies in complex environments. In Proceedings of the IEEE-RAS, international conference on humanoid robots. Karlsruhe, Germany.

  10. Chestnutt, J., Lau, M., Cheng, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the Honda ASIMO humanoid. In Proceedings of the IEEE, international conference on robotics and automation.

  11. Donoho, D. L., & Elad, M. (2003). Maximal sparsity representation via l1 minimization. Proceedings of the National Academy Sciences, 100, 2197–2202.

    MATH  Article  MathSciNet  Google Scholar 

  12. Ferguson, D., & Stentz, A. (2006). Using interpolation to improve path planning: The field D* algorithm. Journal of Field Robotics, 23, 79–101.

    Article  Google Scholar 

  13. Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Annals of Statistics.

  14. Gordon, G. (1999). Approximate solutions to Markov decision processes. Doctoral dissertation, Robotics Institute, Carnegie Mellon University.

  15. Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24, 1463–1467.

    Article  Google Scholar 

  16. Jaynes, E. (2003). Probability: The logic of science. Cambridge: Cambridge University Press.

    Google Scholar 

  17. Kalman, R. (1964). When is a linear control system optimal? Transaction ASME, Journal Basic Engineering, 86, 51–60.

    Google Scholar 

  18. Kelly, A., Amidi, O., Happold, M., Herman, H., Pilarski, T., Rander, P., Stentz, A., Vallidis, N., & Warner, R. (2004). Toward reliable autonomous vehicles operating in challenging environments. In Proceedings of the international symposium on experimental robotics (ISER). Singapore.

  19. Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132.

  20. Kolter, J. Z., Abbeel, P., & Ng, A. Y. (2008). Hierarchical apprenticeship learning with application to quadruped locomotion. Neural Information Processing Systems, 20.

  21. Kulesza, A., & Pereira, F. (2008). Structured learning with approximate inference. In Advances in neural information processing systems. Cambridge: MIT.

    Google Scholar 

  22. LeCun, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.

    Google Scholar 

  23. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In Advances in large margin classifiers. Cambridge: MIT.

    Google Scholar 

  24. Miller, A. T., Knoop, S., Allen, P. K., & Christensen, H. I. (2003). Automatic grasp planning using shape primitives. In Proceedings of the IEEE, International conference on robotics and automation.

  25. Munoz, D., Bagnell, J. A. D., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. In IEEE computer society conference on computer vision and pattern recognition (CVPR).

  26. Munoz, D., Vandapel, N., & Hebert, M. (2008). Directional associative Markov network for 3-d point cloud classification. In Fourth international symposium on 3D data processing, visualization and transmission.

  27. Neu, G., & Szepesvari, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Uncertainty in artificial intelligence (UAI).

  28. Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th international conf. on machine learning.

  29. Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (Vol. 1).

  30. Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.

    Google Scholar 

  31. Ratliff, N., & Bagnell, J. A. (2009). Functional bundle methods. In The Learning workshop. Clearwater Beach, Florida.

  32. Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006a). Maximum margin planning. In Twenty second international conference on machine learning (ICML06).

  33. Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2007a). (Online) subgradient methods for structured prediction. In Artificial intelligence and statistics. San Juan, Puerto Rico.

  34. Ratliff, N., Bradley, D., Bagnell, J. A., & Chestnutt, J. (2006b). Boosting structured prediction for imitation learning. In NIPS. Vancouver, B.C.

  35. Ratliff, N., Srinivasa, S., & Bagnell, J. A. (2007b). Imitation learning for locomotion and manipulation. In IEEE-RAS international conference on humanoid robots.

  36. Rifkin, Y., Poggio (2003). Regularized least squares classification. In Advances in learning theory: methods, models and applications. Amsterdam: IOS Press.

    Google Scholar 

  37. Rosset, S., Zhu, J., & Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. Journal Machine Learning Research, 5, 941–973.

    MathSciNet  Google Scholar 

  38. Schaal, S., & Atkeson, C. (1994). Robot juggling: An implementation of memory-based learning. IEEE Control Systems Magazine, 14.

  39. Shor, N. Z. (1985). Minimization methods for non-differentiable functions. Berlin: Springer.

    Google Scholar 

  40. Silver, D., Bagnell, J. A., & Stentz, A. (2008). High performance outdoor navigation from overhead data using imitation learning. In Proceedings of Robotics Science and Systems.

  41. Silver, D., Sofman, B., Vandapel, N., Bagnell, J. A., & Stentz, A. (2006). Experimental analysis of overhead data processing to support long range navigation. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.

  42. Stentz, A., Bares, J., Pilarski, T., & Stager, D. (2007). The crusher system for autonomous navigation. In AUVSI’s unmanned systems.

  43. Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. In Twenty second international conference on machine learning (ICML05).

  44. Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin Markov networks. In Advances in neural information processing systems (NIPS-14).

  45. Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.

    Google Scholar 

  46. Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50, 2231–2242.

    Article  MathSciNet  Google Scholar 

  47. Vandapel, N., Donamukkala, R. R., & Hebert, M. (2003). Quality assessment of traversability maps from aerial lidar data for an unmanned ground vehicle. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.

  48. Ziebart, B., Bagnell, J. A., Mass, A., & Dey, A. (2008). Maximum entropy inverse reinforcement learning. In Twenty-third AAAI conference.

  49. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the twentieth international conference on machine learning.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nathan D. Ratliff.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ratliff, N.D., Silver, D. & Bagnell, J.A. Learning to search: Functional gradient techniques for imitation learning. Auton Robot 27, 25–53 (2009). https://doi.org/10.1007/s10514-009-9121-3

Download citation

Keywords

  • Imitation learning
  • Structured prediction
  • Subgradient methods
  • Nonparametric optimization
  • Functional gradient techniques
  • Robotics
  • Planning
  • Autonomous navigation
  • Quadrupedal locomotion
  • Grasping