Learning to search: Functional gradient techniques for imitation learning
 Nathan D. Ratliff,
 David Silver,
 J. Andrew Bagnell
 … show all 3 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing highperformance robotic systems. Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poorquality robot performance.
While planning algorithms have shown success in many realworld applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEERAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration.
The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, nonlinear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at nonlinearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical realworld performance with three applied casestudies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These casestudies address key challenges in applying the algorithm in practical settings that utilize stateoftheart planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.
 Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the twentyfirst international conference on machine learning.
 Anderson, B. D. O., Moore, J. B. (1990) Optimal control: linear quadratic methods. Prentice Hall, Englewood Cliffs
 Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems.
 Atkeson, C., Schaal, S., & Moore, A. (1995). Locally weighted learning. AI Review.
 Bain, M., Sammut, C. (1995) A framework for behavioral cloning. Machine intelligence agents. Oxford University Press, London
 Boyd, S., Ghaoui, L. E., Feron, E., & Balakrishnan, V. (1994). Linear matrix inequalities in system and control theory. Society for Industrial and Applied Mathematics (SIAM).
 Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing and generalizing a task in a humanoid robot. In IEEE Transactions on Systems, Man and Cybernetics, Part B. Special issue on robot learning by observation, demonstration and imitation, 37, 286–298.
 CesaBianchi, N., Lugosi, G. (2006) Prediction, learning, and games. Cambridge University Press, New York
 Chestnutt, J., Kuffner, J., Nishiwaki, K., & Kagami, S. (2003). Planning biped navigation strategies in complex environments. In Proceedings of the IEEERAS, international conference on humanoid robots. Karlsruhe, Germany.
 Chestnutt, J., Lau, M., Cheng, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the Honda ASIMO humanoid. In Proceedings of the IEEE, international conference on robotics and automation.
 Donoho, D. L., Elad, M. (2003) Maximal sparsity representation via l1 minimization. Proceedings of the National Academy Sciences 100: pp. 21972202 CrossRef
 Ferguson, D., Stentz, A. (2006) Using interpolation to improve path planning: The field D* algorithm. Journal of Field Robotics 23: pp. 79101 CrossRef
 Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Annals of Statistics.
 Gordon, G. (1999). Approximate solutions to Markov decision processes. Doctoral dissertation, Robotics Institute, Carnegie Mellon University.
 Hersch, M., Guenter, F., Calinon, S., Billard, A. (2008) Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics 24: pp. 14631467 CrossRef
 Jaynes, E. (2003) Probability: The logic of science. Cambridge University Press, Cambridge
 Kalman, R. (1964) When is a linear control system optimal?. Transaction ASME, Journal Basic Engineering 86: pp. 5160
 Kelly, A., Amidi, O., Happold, M., Herman, H., Pilarski, T., Rander, P., Stentz, A., Vallidis, N., & Warner, R. (2004). Toward reliable autonomous vehicles operating in challenging environments. In Proceedings of the international symposium on experimental robotics (ISER). Singapore.
 Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132.
 Kolter, J. Z., Abbeel, P., & Ng, A. Y. (2008). Hierarchical apprenticeship learning with application to quadruped locomotion. Neural Information Processing Systems, 20.
 Kulesza, A., Pereira, F. (2008) Structured learning with approximate inference. Advances in neural information processing systems. MIT, Cambridge
 LeCun, Y., Muller, U., Ben, J., Cosatto, E., Flepp, B. (2006) Offroad obstacle avoidance through endtoend learning. Advances in neural information processing systems. MIT, Cambridge
 Mason, L., Baxter, J., Bartlett, P., Frean, M. (1999) Functional gradient techniques for combining hypotheses. Advances in large margin classifiers. MIT, Cambridge
 Miller, A. T., Knoop, S., Allen, P. K., & Christensen, H. I. (2003). Automatic grasp planning using shape primitives. In Proceedings of the IEEE, International conference on robotics and automation.
 Munoz, D., Bagnell, J. A. D., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional maxmargin Markov networks. In IEEE computer society conference on computer vision and pattern recognition (CVPR).
 Munoz, D., Vandapel, N., & Hebert, M. (2008). Directional associative Markov network for 3d point cloud classification. In Fourth international symposium on 3D data processing, visualization and transmission.
 Neu, G., & Szepesvari, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Uncertainty in artificial intelligence (UAI).
 Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Proc. 17th international conf. on machine learning.
 Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (Vol. 1).
 Puterman, M. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York
 Ratliff, N., & Bagnell, J. A. (2009). Functional bundle methods. In The Learning workshop. Clearwater Beach, Florida.
 Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2006a). Maximum margin planning. In Twenty second international conference on machine learning (ICML06).
 Ratliff, N., Bagnell, J. A., & Zinkevich, M. (2007a). (Online) subgradient methods for structured prediction. In Artificial intelligence and statistics. San Juan, Puerto Rico.
 Ratliff, N., Bradley, D., Bagnell, J. A., & Chestnutt, J. (2006b). Boosting structured prediction for imitation learning. In NIPS. Vancouver, B.C.
 Ratliff, N., Srinivasa, S., & Bagnell, J. A. (2007b). Imitation learning for locomotion and manipulation. In IEEERAS international conference on humanoid robots.
 Rifkin, Y., Poggio, (2003) Regularized least squares classification. Advances in learning theory: methods, models and applications. IOS Press, Amsterdam
 Rosset, S., Zhu, J., Hastie, T. (2004) Boosting as a regularized path to a maximum margin classifier. Journal Machine Learning Research 5: pp. 941973
 Schaal, S., & Atkeson, C. (1994). Robot juggling: An implementation of memorybased learning. IEEE Control Systems Magazine, 14.
 Shor, N. Z. (1985) Minimization methods for nondifferentiable functions. Springer, Berlin
 Silver, D., Bagnell, J. A., & Stentz, A. (2008). High performance outdoor navigation from overhead data using imitation learning. In Proceedings of Robotics Science and Systems.
 Silver, D., Sofman, B., Vandapel, N., Bagnell, J. A., & Stentz, A. (2006). Experimental analysis of overhead data processing to support long range navigation. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.
 Stentz, A., Bares, J., Pilarski, T., & Stager, D. (2007). The crusher system for autonomous navigation. In AUVSI’s unmanned systems.
 Taskar, B., Chatalbashev, V., Guestrin, C., & Koller, D. (2005). Learning structured prediction models: A large margin approach. In Twenty second international conference on machine learning (ICML05).
 Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin Markov networks. In Advances in neural information processing systems (NIPS14).
 Taskar, B., LacosteJulien, S., Jordan, M. (2006) Structured prediction via the extragradient method. Advances in neural information processing systems. MIT, Cambridge
 Tropp, J. A. (2004) Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory 50: pp. 22312242 CrossRef
 Vandapel, N., Donamukkala, R. R., & Hebert, M. (2003). Quality assessment of traversability maps from aerial lidar data for an unmanned ground vehicle. In Proceedings of the IEEE/JRS international conference on intelligent robots and systems.
 Ziebart, B., Bagnell, J. A., Mass, A., & Dey, A. (2008). Maximum entropy inverse reinforcement learning. In Twentythird AAAI conference.
 Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the twentieth international conference on machine learning.
 Title
 Learning to search: Functional gradient techniques for imitation learning
 Journal

Autonomous Robots
Volume 27, Issue 1 , pp 2553
 Cover Date
 20090701
 DOI
 10.1007/s1051400991213
 Print ISSN
 09295593
 Online ISSN
 15737527
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Imitation learning
 Structured prediction
 Subgradient methods
 Nonparametric optimization
 Functional gradient techniques
 Robotics
 Planning
 Autonomous navigation
 Quadrupedal locomotion
 Grasping
 Industry Sectors
 Authors

 Nathan D. Ratliff ^{(1)}
 David Silver ^{(1)}
 J. Andrew Bagnell ^{(2)}
 Author Affiliations

 1. Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
 2. Robotics Institute and Machine Learning, Carnegie Mellon University, Pittsburgh, PA, 15213, USA