Learning to search: Functional gradient techniques for imitation learning

Ratliff, Nathan D.; Silver, David; Bagnell, J. Andrew

doi:10.1007/s10514-009-9121-3

Learning to search: Functional gradient techniques for imitation learning

Published: 17 June 2009

Volume 27, pages 25–53, (2009)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Nathan D. Ratliff¹,
David Silver¹ &
J. Andrew Bagnell²

1956 Accesses
111 Citations
6 Altmetric
Explore all metrics

Abstract

Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.

While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration.

The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the twenty-first international conference on machine learning.
Anderson, B. D. O., & Moore, J. B. (1990). Optimal control: linear quadratic methods. Englewood Cliffs: Prentice Hall.
MATH Google Scholar
Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems.
Atkeson, C., Schaal, S., & Moore, A. (1995). Locally weighted learning. AI Review.
Bain, M., & Sammut, C. (1995). A framework for behavioral cloning. In Machine intelligence agents. London: Oxford University Press.
Google Scholar
Boyd, S., Ghaoui, L. E., Feron, E., & Balakrishnan, V. (1994). Linear matrix inequalities in system and control theory. Society for Industrial and Applied Mathematics (SIAM).
Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. In IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation, 37, 286–298.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press.
MATH Google Scholar
Chestnutt, J., Kuffner, J., Nishiwaki, K., & Kagami, S. (2003). Planning biped navigation strategies in complex environments. In Proceedings of the IEEE-RAS, international conference on humanoid robots. Karlsruhe, Germany.
Chestnutt, J., Lau, M., Cheng, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the Honda ASIMO humanoid. In Proceedings of the IEEE, international conference on robotics and automation.
Donoho, D. L., & Elad, M. (2003). Maximal sparsity representation via l1 minimization. Proceedings of the National Academy Sciences, 100, 2197–2202.
Article MATH MathSciNet Google Scholar
Ferguson, D., & Stentz, A. (2006). Using interpolation to improve path planning: The field D* algorithm. Journal of Field Robotics, 23, 79–101.
Article Google Scholar
Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Annals of Statistics.
Gordon, G. (1999). Approximate solutions to Markov decision processes. Doctoral dissertation, Robotics Institute, Carnegie Mellon University.
Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24, 1463–1467.
Article Google Scholar
Jaynes, E. (2003). Probability: The logic of science. Cambridge: Cambridge University Press.
MATH Google Scholar
Kalman, R. (1964). When is a linear control system optimal? Transaction ASME, Journal Basic Engineering, 86, 51–60.
Google Scholar
Kelly, A., Amidi, O., Happold, M., Herman, H., Pilarski, T., Rander, P., Stentz, A., Vallidis, N., & Warner, R. (2004). Toward reliable autonomous vehicles operating in challenging environments. In Proceedings of the international symposium on experimental robotics (ISER). Singapore.
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132.
Kolter, J. Z., Abbeel, P., & Ng, A. Y. (2008). Hierarchical apprenticeship learning with application to quadruped locomotion. Neural Information Processing Systems, 20.
Kulesza, A., & Pereira, F. (2008). Structured learning with approximate inference. In Advances in neural information processing systems. Cambridge: MIT.
Google Scholar
LeCun, Y., Muller, U., Ben, J., Cosatto, E., & Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.
Google Scholar
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In Advances in large margin classifiers. Cambridge: MIT.
Google Scholar
Miller, A. T., Knoop, S., Allen, P. K., & Christensen, H. I. (2003). Automatic grasp planning using shape primitives. In Proceedings of the IEEE, International conference on robotics and automation.
Munoz, D., Bagnell, J. A. D., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. In IEEE computer society conference on computer vision and pattern recognition (CVPR).
Munoz, D., Vandapel, N., & Hebert, M. (2008). Directional associative Markov network for 3-d point cloud classification. In Fourth international symposium on 3D data processing, visualization and transmission.
Neu, G., & Szepesvari, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Uncertainty in artificial intelligence (UAI).
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th international conf. on machine learning.
Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (Vol. 1).
Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
MATH Google Scholar
Ratliff, N., & Bagnell, J. A. (2009). Functional bundle methods. In The Learning workshop. Clearwater Beach, Florida.
Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006a). Maximum margin planning. In Twenty second international conference on machine learning (ICML06).
Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2007a). (Online) subgradient methods for structured prediction. In Artificial intelligence and statistics. San Juan, Puerto Rico.
Ratliff, N., Bradley, D., Bagnell, J. A., & Chestnutt, J. (2006b). Boosting structured prediction for imitation learning. In NIPS. Vancouver, B.C.
Ratliff, N., Srinivasa, S., & Bagnell, J. A. (2007b). Imitation learning for locomotion and manipulation. In IEEE-RAS international conference on humanoid robots.
Rifkin, Y., Poggio (2003). Regularized least squares classification. In Advances in learning theory: methods, models and applications. Amsterdam: IOS Press.
Google Scholar
Rosset, S., Zhu, J., & Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. Journal Machine Learning Research, 5, 941–973.
MathSciNet Google Scholar
Schaal, S., & Atkeson, C. (1994). Robot juggling: An implementation of memory-based learning. IEEE Control Systems Magazine, 14.
Shor, N. Z. (1985). Minimization methods for non-differentiable functions. Berlin: Springer.
MATH Google Scholar
Silver, D., Bagnell, J. A., & Stentz, A. (2008). High performance outdoor navigation from overhead data using imitation learning. In Proceedings of Robotics Science and Systems.
Silver, D., Sofman, B., Vandapel, N., Bagnell, J. A., & Stentz, A. (2006). Experimental analysis of overhead data processing to support long range navigation. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.
Stentz, A., Bares, J., Pilarski, T., & Stager, D. (2007). The crusher system for autonomous navigation. In AUVSI’s unmanned systems.
Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. In Twenty second international conference on machine learning (ICML05).
Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin Markov networks. In Advances in neural information processing systems (NIPS-14).
Taskar, B., Lacoste-Julien, S., & Jordan, M. (2006). Structured prediction via the extragradient method. In Advances in neural information processing systems (Vol. 18). Cambridge: MIT.
Google Scholar
Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50, 2231–2242.
Article MathSciNet Google Scholar
Vandapel, N., Donamukkala, R. R., & Hebert, M. (2003). Quality assessment of traversability maps from aerial lidar data for an unmanned ground vehicle. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.
Ziebart, B., Bagnell, J. A., Mass, A., & Dey, A. (2008). Maximum entropy inverse reinforcement learning. In Twenty-third AAAI conference.
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the twentieth international conference on machine learning.

Download references

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Nathan D. Ratliff & David Silver
Robotics Institute and Machine Learning, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
J. Andrew Bagnell

Authors

Nathan D. Ratliff
View author publications
You can also search for this author in PubMed Google Scholar
David Silver
View author publications
You can also search for this author in PubMed Google Scholar
J. Andrew Bagnell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathan D. Ratliff.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ratliff, N.D., Silver, D. & Bagnell, J.A. Learning to search: Functional gradient techniques for imitation learning. Auton Robot 27, 25–53 (2009). https://doi.org/10.1007/s10514-009-9121-3

Download citation

Received: 18 November 2008
Accepted: 13 May 2009
Published: 17 June 2009
Issue Date: July 2009
DOI: https://doi.org/10.1007/s10514-009-9121-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Learning to search: Functional gradient techniques for imitation learning

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to search: Functional gradient techniques for imitation learning

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation