Robot Learning

Peters, Jan; Lee, Daniel D.; Kober, Jens; Nguyen-Tuong, Duy; Bagnell, J. Andrew; Schaal, Stefan

doi:10.1007/978-3-319-32552-1_15

Jan Peters³,
Daniel D. Lee⁴,
Jens Kober⁵,
Duy Nguyen-Tuong⁶,
J. Andrew Bagnell⁷ &
…
Stefan Schaal⁸

Part of the book series: Springer Handbooks ((SHB))

89k Accesses
9 Citations

Abstract

Machine learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors; conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in robot learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this chapter, we attempt to strengthen the links between the two research communities by providing a survey of work in robot learning for learning control and behavior generation in robots. We highlight both key challenges in robot learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our chapter lies on model learning for control and robot reinforcement learning. We demonstrate how machine learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 269.00; Price excludes VAT (USA)

Hardcover Book: USD 349.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

CMAC:: cerebellar model articulation controller
DDP:: differential dynamic programming
MDP:: Markov decision process
MPC:: model predictive control
MRAC:: model reference adaptive control
OSC:: operational-space control
PD:: proportional–derivative
REINFORCE:: reward increment = nonnegative factor × offset reinforcement × characteristic eligibility
RL:: reinforcement learning
SARSA:: state action-reward-state-action
SVD:: singular value decomposition
SVR:: support vector regression
ZMP:: zero moment point

References

S. Schaal: The new robotics – Towards human-centered machines, HFSP J. Front. Interdiscip. Res, Life Sci. 1(2), 115–126 (2007)
Google Scholar
B.D. Ziebart, A. Maas, J.A. Bagnell, A.K. Dey: Maximum entropy inverse reinforcement learning, AAAI Conf. Artif. Intell. (2008)
Google Scholar
S. Thrun, W. Burgard, D. Fox: Probabilistic Robotics (MIT, Cambridge 2005)
MATH Google Scholar
B. Apolloni, A. Ghosh, F. Alpaslan, L.C. Jain, S. Patnaik (Eds.): Machine Learning and Robot Perception, Stud. Comput. Intell., Vol. 7 (Springer, Berlin, Heidelberg 2005)
MATH Google Scholar
O. Jenkins, R. Bodenheimer, R. Peters: Manipulation manifolds: Explorations into uncovering manifolds in sensory-motor spaces, Int. Conf. Dev. Learn. (2006)
Google Scholar
M. Toussaint: Machine learning and robotics, Tutor. Conf. Mach. Learn. (2011)
Google Scholar
D.P. Bertsekas: Dynamic Programming and Optimal Control (Athena Scientific, Nashua 1995)
MATH Google Scholar
R.E. Kalman: When is a linear control system optimal?, J. Basic Eng. 86(1), 51–60 (1964)
Article Google Scholar
D. Nguyen-Tuong, J. Peters: Model learning in robotics: A survey, Cogn. Process. 12(4), 319–340 (2011)
Article Google Scholar
J. Kober, D. Bagnell, J. Peters: Reinforcement learning in robotics: A survey, Int. J. Robotics Res. 32(11), 1238–1274 (2013)
Article Google Scholar
J.H. Connell, S. Mahadevan: Robot Learning (Kluwer Academic, Dordrecht 1993)
Book MATH Google Scholar
J. Ham, Y. Lin, D.D. Lee: Learning nonlinear appearance manifolds for robot localization, Int. Conf. Intell. Robots Syst. (2005)
Google Scholar
R.S. Sutton, A.G. Barto: Reinforcement Learning (MIT, Cambridge 1998)
MATH Google Scholar
D. Nguyen-Tuong, J. Peters: Model learning with local Gaussian process regression, Adv. Robotics 23(15), 2015–2034 (2009)
Article Google Scholar
J. Nakanishi, R. Cory, M. Mistry, J. Peters, S. Schaal: Operational space control: A theoretical and emprical comparison, Int. J. Robotics Res. 27(6), 737–757 (2008)
Article Google Scholar
F.R. Reinhart, J.J. Steil: Attractor-based computation with reservoirs for online learning of inverse kinematics, Proc. Eur. Symp. Artif. Neural Netw. (2009)
Google Scholar
J. Ting, M. Kalakrishnan, S. Vijayakumar, S. Schaal: Bayesian kernel shaping for learning control, Adv. Neural Inform. Process. Syst., Vol. 21 (2008) pp. 1673–1680
Google Scholar
J. Steffen, S. Klanke, S. Vijayakumar, H.J. Ritter: Realising dextrous manipulation with structured manifolds using unsupervised kernel regression with structural hints, ICRA 2009 Workshop: Approaches Sens. Learn. Humanoid Robots, Kobe (2009)
Google Scholar
S. Klanke, D. Lebedev, R. Haschke, J.J. Steil, H. Ritter: Dynamic path planning for a 7-dof robot arm, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2006)
Google Scholar
A. Angelova, L. Matthies, D. Helmick, P. Perona: Slip prediction using visual information, Proc. Robotics Sci. Syst., Philadelphia (2006)
Google Scholar
M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal: Learning locomotion over rough terrain using terrain templates, IEEE Int. Conf. Intell. Robots Syst. (2009)
Google Scholar
N. Hawes, J.L. Wyatt, M. Sridharan, M. Kopicki, S. Hongeng, I. Calvert, A. Sloman, G.-J. Kruijff, H. Jacobsson, M. Brenner, D. Skočaj, A. Vrečko, N. Majer, M. Zillich: The playmate system, Cognit. Syst. 8, 367–393 (2010)
Article Google Scholar
D. Skočaj, M. Kristan, A. Vrečko, A. Leonardis, M. Fritz, M. Stark, B. Schiele, S. Hongeng, J.L. Wyatt: Multi-modal learning, Cogn. Syst. 8, 265–309 (2010)
Article Google Scholar
O.J. Smith: A controller to overcome dead-time, Instrum. Soc. Am. J. 6, 28–33 (1959)
Google Scholar
K.S. Narendra, A.M. Annaswamy: Stable Adaptive Systems (Prentice Hall, New Jersey 1989)
MATH Google Scholar
S. Nicosia, P. Tomei: Model reference adaptive control algorithms for industrial robots, Automatica 20, 635–644 (1984)
Article MATH Google Scholar
J.M. Maciejowski: Predictive Control with Constraints (Prentice Hall, New Jersey 2002)
MATH Google Scholar
R.S. Sutton: Dyna, an integrated architecture for learning, planning, and reacting, SIGART Bulletin 2(4), 160–163 (1991)
Article Google Scholar
C.G. Atkeson, J. Morimoto: Nonparametric representation of policies and value functions: A trajectory-based approach, Adv. Neural Inform. Process. Syst., Vol. 15 (2002)
Google Scholar
A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Proc. 11th Int. Symp. Exp. Robotics (2004)
Google Scholar
C.E. Rasmussen, M. Kuss: Gaussian processes in reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 751–758
Google Scholar
A. Rottmann, W. Burgard: Adaptive autonomous control using online value iteration with Gaussian processes, Proc. IEEE Int. Conf. Robotics Autom. (2009)
Google Scholar
J.-J.E. Slotine, W. Li: Applied Nonlinear Control (Prentice Hall, Upper Saddle River 1991)
MATH Google Scholar
A. De Luca, P. Lucibello: A general algorithm for dynamic feedback linearization of robots with elastic joints, Proc. IEEE Int. Conf. Robotics Autom. (1998)
Google Scholar
I. Jordan, D. Rumelhart: Forward models: Supervised learning with a distal teacher, Cognit. Sci. 16, 307–354 (1992)
Article Google Scholar
D.M. Wolpert, M. Kawato: Multiple paired forward and inverse models for motor control, Neural Netw. 11, 1317–1329 (1998)
Article Google Scholar
M. Kawato: Internal models for motor control and trajectory planning, Curr. Opin. Neurobiol. 9(6), 718–727 (1999)
Article Google Scholar
D.M. Wolpert, R.C. Miall, M. Kawato: Internal models in the cerebellum, Trends Cogn. Sci. 2(9), 338–347 (1998)
Article Google Scholar
N. Bhushan, R. Shadmehr: Evidence for a forward dynamics model in human adaptive motor control, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 3–9
Google Scholar
K. Narendra, J. Balakrishnan, M. Ciliz: Adaptation and learning using multiple models, switching and tuning, IEEE Control Syst, Mag. 15(3), 37–51 (1995)
Google Scholar
K. Narendra, J. Balakrishnan: Adaptive control using multiple models, IEEE Trans. Autom. Control 42(2), 171–187 (1997)
Article MathSciNet MATH Google Scholar
M. Haruno, D.M. Wolpert, M. Kawato: Mosaic model for sensorimotor learning and control, Neural Comput. 13(10), 2201–2220 (2001)
Article MATH Google Scholar
J. Peters, S. Schaal: Learning to control in operational space, Int. J. Robotics Res. 27(2), 197–212 (2008)
Article Google Scholar
H. Akaike: Autoregressive model fitting for control, Ann. Inst. Stat. Math. 23, 163–180 (1970)
Article MathSciNet MATH Google Scholar
R.M.C. De Keyser, A.R.V. Cauwenberghe: A self-tuning multistep predictor application, Automatica 17, 167–174 (1980)
Article Google Scholar
S.S. Billings, S. Chen, G. Korenberg: Identification of mimo nonlinear systems using a forward-regression orthogonal estimator, Int. J. Control 49, 2157–2189 (1989)
Article MathSciNet MATH Google Scholar
E. Mosca, G. Zappa, J.M. Lemos: Robustness of multipredictor adaptive regulators: MUSMAR, Automatica 25, 521–529 (1989)
Article MathSciNet MATH Google Scholar
J. Kocijan, R. Murray-Smith, C. Rasmussen, A. Girard: Gaussian process model based predictive control, Proc. Am. Control Conf. (2004)
Google Scholar
A. Girard, C.E. Rasmussen, J.Q. Candela, R.M. Smith: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) pp. 545–552
Google Scholar
C.G. Atkeson, A. Moore, S. Stefan: Locally weighted learning for control, AI Review 11, 75–113 (1997)
Google Scholar
L. Ljung: System Identification – Theory for the User (Prentice-Hall, New Jersey 2004)
MATH Google Scholar
S. Haykin: Neural Networks: A Comprehensive Foundation (Prentice Hall, New Jersey 1999)
MATH Google Scholar
J.J. Steil: Backpropagation-decorrelation: Online recurrent learning with O(N) complexity, Proc. Int. Jt. Conf. Neural Netw. (2004)
Google Scholar
C.E. Rasmussen, C.K. Williams: Gaussian Processes for Machine Learning (MIT, Cambridge 2006)
MATH Google Scholar
B. Schölkopf, A. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond (MIT, Cambridge 2002)
Google Scholar
K.J. Aström, B. Wittenmark: Adaptive Control (Addison Wesley, Boston 1995)
MATH Google Scholar
F.J. Coito, J.M. Lemos: A long-range adaptive controller for robot manipulators, Int. J. Robotics Res. 10, 684–707 (1991)
Article Google Scholar
P. Vempaty, K. Cheok, R. Loh: Model reference adaptive control for actuators of a biped robot locomotion, Proc. World Congr. Eng. Comput. Sci. (2009)
Google Scholar
J.R. Layne, K.M. Passino: Fuzzy model reference learning control, J. Intell. Fuzzy Syst. 4, 33–47 (1996)
Article Google Scholar
J. Nakanishi, J.A. Farrell, S. Schaal: Composite adaptive control with locally weighted statistical learning, Neural Netw. 18(1), 71–90 (2005)
Article MATH Google Scholar
J.J. Craig: Introduction to Robotics: Mechanics and Control (Prentice Hall, Upper Saddle River 2004)
Google Scholar
M.W. Spong, S. Hutchinson, M. Vidyasagar: Robot Dynamics and Control (Wiley, New York 2006)
Google Scholar
S. Schaal, C.G. Atkeson, S. Vijayakumar: Scalable techniques from nonparametric statistics for real-time robot learning, Appl. Intell. 17(1), 49–60 (2002)
Article MATH Google Scholar
H. Cao, Y. Yin, D. Du, L. Lin, W. Gu, Z. Yang: Neural network inverse dynamic online learning control on physical exoskeleton, 13th Int. Conf. Neural Inform. Process. (2006)
Google Scholar
C.G. Atkeson, C.H. An, J.M. Hollerbach: Estimation of inertial parameters of manipulator loads and links, Int. J. Robotics Res. 5(3), 101–119 (1986)
Article Google Scholar
E. Burdet, B. Sprenger, A. Codourey: Experiments in nonlinear adaptive control, Int. Conf. Robotics Autom. 1, 537–542 (1997)
Article Google Scholar
E. Burdet, A. Codourey: Evaluation of parametric and nonparametric nonlinear adaptive controllers, Robotica 16(1), 59–73 (1998)
Article Google Scholar
K.S. Narendra, A.M. Annaswamy: Persistent excitation in adaptive systems, Int. J. Control 45, 127–160 (1987)
Article MathSciNet MATH Google Scholar
H.D. Patino, R. Carelli, B.R. Kuchen: Neural networks for advanced control of robot manipulators, IEEE Trans. Neural Netw. 13(2), 343–354 (2002)
Article Google Scholar
D. Nguyen-Tuong, J. Peters: Incremental sparsification for real-time online model learning, Neurocomputing 74(11), 1859–1867 (2011)
Article Google Scholar
D. Nguyen-Tuong, J. Peters: Using model knowledge for learning inverse dynamics, Proc. IEEE Int. Conf. Robotics Autom. (2010)
Google Scholar
S.S. Ge, T.H. Lee, E.G. Tan: Adaptive neural network control of flexible joint robots based on feedback linearization, Int. J. Syst. Sci. 29(6), 623–635 (1998)
Article Google Scholar
C.M. Chow, A.G. Kuznetsov, D.W. Clarke: Successive one-step-ahead predictions in multiple model predictive control, Int. J. Control 29, 971–979 (1998)
MATH Google Scholar
M. Kawato: Feedback error learning neural network for supervised motor learning. In: Advanced Neural Computers, ed. by R. Eckmiller (Elsevier, North-Holland, Amsterdam 1990) pp. 365–372
Google Scholar
J. Nakanishi, S. Schaal: Feedback error learning and nonlinear adaptive control, Neural Netw. 17(10), 1453–1465 (2004)
Article MATH Google Scholar
T. Shibata, C. Schaal: Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks, Neural Netw. 14(2), 201–216 (2001)
Article Google Scholar
H. Miyamoto, M. Kawato, T. Setoyama, R. Suzuki: Feedback-error-learning neural network for trajectory control of a robotic manipulator, Neural Netw. 1(3), 251–265 (1988)
Article Google Scholar
H. Gomi, M. Kawato: Recognition of manipulated objects by motor learning with modular architecture networks, Neural Netw. 6(4), 485–497 (1993)
Article Google Scholar
A. D'Souza, S. Vijayakumar, S. Schaal: Learning inverse kinematics, IEEE Int. Conf. Intell. Robots Syst. (2001)
Google Scholar
S. Vijayakumar, S. Schaal: Locally weighted projection regression: An O(N) algorithm for incremental real time learning in high dimensional space, Proc. 16th Int. Conf. Mach. Learn. (2000)
Google Scholar
M. Toussaint, S. Vijayakumar: Learning discontinuities with products-of-sigmoids for switching between local models, Proc. 22nd Int. Conf. Mach. Learn. (2005)
Google Scholar
J. Tenenbaum, V. de Silva, J. Langford: A global geometric framework for nonlinear dimensionality reduction, Science 290, 2319–2323 (2000)
Article Google Scholar
S. Roweis, L. Saul: Nonlinear dimensionality reduction by locally linear embedding, Science 290, 2323 (2000)
Article Google Scholar
H. Hoffman, S. Schaal, S. Vijayakumar: Local dimensionality reduction for non-parametric regression, Neural Process. Lett. 29(2), 109–131 (2009)
Article Google Scholar
S. Thrun, T. Mitchell: Lifelong robot learning, Robotics Auton. Syst. 15, 25–46 (1995)
Article Google Scholar
Y. Engel, S. Mannor, R. Meir: Sparse online greedy support vector regression, Eur. Conf. Mach. Learn. (2002)
Google Scholar
A.J. Smola, B. Schölkopf: A tutorial on support vector regression, Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
C.E. Rasmussen: Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression (University of Toronto, Toronto 1996)
Google Scholar
L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines (MIT, Cambridge 2007)
Book Google Scholar
J.Q. Candela, C.E. Rasmussen: A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6, 1939–1959 (2005)
MathSciNet MATH Google Scholar
R. Genov, S. Chakrabartty, G. Cauwenberghs: Silicon support vector machine with online learning, Int. J. Pattern Recognit. Articial Intell. 17, 385–404 (2003)
Article Google Scholar
S. Vijayakumar, A. D'Souza, S. Schaal: Incremental online learning in high dimensions, Neural Comput. 12(11), 2602–2634 (2005)
Article MathSciNet Google Scholar
B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior knowledge in support vector kernel, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 640–646
Google Scholar
E. Krupka, N. Tishby: Incorporating prior knowledge on features into learning, Int. Conf. Artif. Intell. Stat. (San Juan, Puerto Rico 2007)
Google Scholar
A. Smola, T. Friess, B. Schoelkopf: Semiparametric support vector and linear programming machines, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 585–591
Google Scholar
B.J. Kröse, N. Vlassis, R. Bunschoten, Y. Motomura: A probabilistic model for appearance-based robot localization, Image Vis. Comput. 19, 381–391 (2001)
Article Google Scholar
M.K. Titsias, N.D. Lawrence: Bayesian Gaussian process latent variable model, Proc. 13th Int. Conf. Artif. Intell. Stat. (2010)
Google Scholar
R. Jacobs, M. Jordan, S. Nowlan, G.E. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)
Article Google Scholar
S. Calinon, F. D'halluin, E. Sauser, D. Caldwell, A. Billard: A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation, IEEE Robotics Autom. Mag. 17, 44–54 (2010)
Article Google Scholar
V. Treps: A bayesian committee machine, Neural Comput. 12(11), 2719–2741 (2000)
Article Google Scholar
L. Csato, M. Opper: Sparse online Gaussian processes, Neural Comput. 14(3), 641–668 (2002)
Article MATH Google Scholar
D.H. Grollman, O.C. Jenkins: Sparse incremental learning for interactive robot control policy estimation, IEEE Int. Conf. Robotics Autom., Pasadena (2008)
Google Scholar
M. Seeger: Gaussian processes for machine learning, Int. J. Neural Syst. 14(2), 69–106 (2004)
Article Google Scholar
C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy, W. Burgard: Learning predictive terrain models for legged robot locomotion, Proc. IEEE Int. Conf. Intell. Robots Syst. (2008)
Google Scholar
J. Ko, D. Fox: GP-bayesfilters: Bayesian filtering using Gaussian process prediction and observation models, Auton. Robots 27(1), 75–90 (2009)
Article Google Scholar
J.P. Ferreira, M. Crisostomo, A.P. Coimbra, B. Ribeiro: Simulation control of a biped robot with support vector regression, IEEE Int. Symp. Intell. Signal Process. (2007)
Google Scholar
R. Pelossof, A. Miller, P. Allen, T. Jebara: An SVM learning approach to robotic grasping, IEEE Int. Conf. Robotics Autom. (2004)
Google Scholar
J. Ma, J. Theiler, S. Perkins: Accurate on-line support vector regression, Neural Comput. 15, 2683–2703 (2005)
Article MATH Google Scholar
Y. Choi, S.Y. Cheong, N. Schweighofer: Local online support vector regression for learning control, Proc. IEEE Int. Symp. Comput. Intell. Robotics Autom. (2007)
Google Scholar
J.-A. Ting, A. D'Souza, S. Schaal: Bayesian robot system identification with input and output noise, Neural Netw. 24(1), 99–108 (2011)
Article MATH Google Scholar
S. Nowlan, G.E. Hinton: Evaluation of adaptive mixtures of competing experts, Adv. Neural Inform. Process. Syst., Vol. 3 (1991) pp. 774–780
Google Scholar
V. Treps: Mixtures of Gaussian processes, Adv. Neural Inform. Process. Syst., Vol. 13 (2001) pp. 654–660
Google Scholar
C.E. Rasmussen, Z. Ghahramani: Infinite mixtures of Gaussian process experts, Adv. Neural Inform. Process. Syst., Vol. 14 (2002) pp. 881–888
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning (Springer, New York, 2001)
Book MATH Google Scholar
W.K. Haerdle, M. Mueller, S. Sperlich, A. Werwatz: Nonparametric and Semiparametric Models (Springer, New York 2004)
Book Google Scholar
D.J. MacKay: A practical Bayesian framework for back-propagation networks, Computation 4(3), 448–472 (1992)
Google Scholar
R.M. Neal: Bayesian Learning for Neural Networks, Lecture Notes in Statistics, Vol. 118 (Springer, New York 1996)
Book MATH Google Scholar
B. Schölkopf, A.J. Smola, R. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)
Article Google Scholar
C. Plagemann, K. Kersting, P. Pfaff, W. Burgard: Heteroscedastic Gaussian process regression for modeling range sensors in mobile robotics, Snowbird Learn. Workshop (2007)
Google Scholar
W.S. Cleveland, C.L. Loader: Smoothing by local regression: Principles and methods. In: Statistical Theory and Computational Aspects of Smoothing, ed. by W. Härdle, M.G. Schimele (Physica, Heidelberg 1996)
Google Scholar
J. Fan, I. Gijbels: Local Polynomial Modelling and Its Applications (Chapman Hall, New York 1996)
MATH Google Scholar
J. Fan, I. Gijbels: Data driven bandwidth selection in local polynomial fitting, J. R. Stat. Soc. 57(2), 371–394 (1995)
MATH Google Scholar
A. Moore, M.S. Lee: Efficient algorithms for minimizing cross validation error, Proc. 11th Int. Conf. Mach. Learn. (1994)
Google Scholar
A. Moore: Fast, robust adaptive control by learning only forward models, Adv. Neural Inform. Process. Syst., Vol. 4 (1992) pp. 571–578
Google Scholar
C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning for control, Artif. Intell. Rev. 11, 75–113 (1997)
Article Google Scholar
G. Tevatia, S. Schaal: Efficient Inverse Kinematics Algorithms for High-Dimensional Movement Systems (University of Southern California, Los Angeles 2008)
Google Scholar
C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning, Artif. Intell. Rev. 11(1–5), 11–73 (1997)
Article Google Scholar
N.U. Edakunni, S. Schaal, S. Vijayakumar: Kernel carpentry for online regression using randomly varying coefficient model, Proc. 20th Int. Jt. Conf. Artif. Intell. (2007)
Google Scholar
D.H. Jacobson, D.Q. Mayne: Differential Dynamic Programming (American Elsevier, New York 1973)
MATH Google Scholar
C.G. Atkeson, S. Schaal: Robot learning from demonstration, Proc. 14th Int. Conf. Mach. Learn. (1997)
Google Scholar
J. Morimoto, G. Zeglin, C.G. Atkeson: Minimax differential dynamic programming: Application to a biped walking robot, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2003)
Google Scholar
P. Abbeel, A. Coates, M. Quigley, A.Y. Ng: An application of reinforcement learning to aerobatic helicopter flight, Adv. Neural Inform. Process. Syst., Vol. 19 (2007) pp. 1–8
Google Scholar
P.W. Glynn: Likelihood ratio gradient estimation: An overview, Proc. Winter Simul. Conf. (1987)
Google Scholar
A.Y. Ng, M. Jordan: Pegasus: A policy search method for large MDPs and POMDPs, Proc. 16th Conf. Uncertain. Artif. Intell. (2000)
Google Scholar
B.M. Akesson, H.T. Toivonen: A neural network model predictive controller, J. Process Control 16(9), 937–946 (2006)
Article Google Scholar
D. Gu, H. Hu: Predictive control for a car-like mobile robot, Robotics Auton. Syst. 39, 73–86 (2002)
Article Google Scholar
E.A. Wan, A.A. Bogdanov: Model predictive neural control with applications to a 6 DOF helicopter model, Proc. Am. Control Conf. (2001)
Google Scholar
O. Khatib: A unified approach for motion and force control of robot manipulators: The operational space formulation, J. Robotics Autom. 3(1), 43–53 (1987)
Article Google Scholar
J. Peters, M. Mistry, F.E. Udwadia, J. Nakanishi, S. Schaal: A unifying methodology for robot control with redundant dofs, Auton. Robots 24(1), 1–12 (2008)
Article Google Scholar
C. Salaun, V. Padois, O. Sigaud: Control of redundant robots using learned models: An operational space control approach, Proc. IEEE Int. Conf. Intell. Robots Syst. (2009)
Google Scholar
F.R. Reinhart, J.J. Steil: Recurrent neural associative learning of forward and inverse kinematics for movement generation of the redundant PA-10 robot, Symp. Learn. Adapt. Behav. Robotics Syst. (2008)
Google Scholar
J.Q. Candela, C.E. Rasmussen, C.K. Williams: Large Scale Kernel Machines (MIT, Cambridge 2007)
Google Scholar
S. Ben-David, R. Schuller: Exploiting task relatedness for multiple task learning, Proc. Conf. Learn. Theory (2003)
Google Scholar
I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. 6, 1453–1484 (2005)
MathSciNet MATH Google Scholar
O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)
Book Google Scholar
J.D. Lafferty, A. McCallum, F.C.N. Pereira: Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th Int. Conf. Mach. Learn. (2001)
Google Scholar
K. Muelling, J. Kober, O. Kroemer, J. Peters: Learning to select and generalize striking movements in robot table tennis, Int. J. Robotics Res. 32(3), 263–279 (2012)
Article Google Scholar
S. Mahadevan, J. Connell: Automatic programming of behavior-based robots using reinforcement learning, Artif. Intell. 55(2/3), 311–365 (1992)
Article Google Scholar
V. Gullapalli, J.A. Franklin, H. Benbrahim: Acquiring robot skills via reinforcement learning, IEEE Control Syst. Mag. 14(1), 13–24 (1994)
Article Google Scholar
J.A. Bagnell, J.C. Schneider: Autonomous helicopter control using reinforcement learning policy search methods, IEEE Int. Conf. Robotics Autom. (2001)
Google Scholar
S. Schaal: Learning from demonstration, Adv. Neural Inform. Process. Syst., Vol. 9 (1996) pp. 1040–1046
Google Scholar
W. B. Powell: AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization, Tech. Rep. (Princeton University, Princeton 2012)
Google Scholar
C.G. Atkeson: Nonparametric model-based reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 1008–1014
Google Scholar
A. Coates, P. Abbeel, A.Y. Ng: Apprenticeship learning for helicopter control, Communication ACM 52(7), 97–105 (2009)
Article Google Scholar
R.S. Sutton, A.G. Barto, R.J. Williams: Reinforcement learning is direct adaptive optimal control, Am. Control Conf. (1991)
Google Scholar
A.D. Laud: Theory and Application of Reward Shaping in Reinforcement Learning (University of Illinois, Urbana-Champaign 2004)
Google Scholar
M.P. Deisenrot, C.E. Rasmussen: PILCO: A model-based and data-efficient approach to policy search, 28th Int. Conf. Mach. Learn. (2011)
Google Scholar
H. Miyamoto, S. Schaal, F. Gandolfo, H. Gomi, Y. Koike, R. Osu, E. Nakano, Y. Wada, M. Kawato: A Kendama learning robot based on bidirectional theory, Neural Netw. 9(8), 1281–1302 (1996)
Article Google Scholar
N. Kohl, P. Stone: Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE Int. Conf. Robotics Autom. (2004)
Google Scholar
R. Tedrake, T.W. Zhang, H.S. Seung: Learning to walk in 20 minutes, Yale Workshop Adapt. Learn. Syst. (2005)
Google Scholar
J. Peters, S. Schaal: Reinforcement learning of motor skills with policy gradients, Neural Netw. 21(4), 682–697 (2008)
Article Google Scholar
J. Peters, S. Schaal: Natural actor-critic, Neurocomputing 71(7–9), 1180–1190 (2008)
Article Google Scholar
J. Kober, J. Peters: Policy search for motor primitives in robotics, Adv. Neural Inform. Process. Syst., Vol. 21 (2009) pp. 849–856
Google Scholar
M.P. Deisenroth, C.E. Rasmussen, D. Fox: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems VII, ed. by H. Durrand-Whyte, N. Roy, P. Abbeel (MIT, Cambridge 2011)
Google Scholar
L.P. Kaelbling, M.L. Littman, A.W. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
M.E. Lewis, M.L. Puterman: The Handbook of Markov Decision Processes: Methods and Applications (Kluwer, Dordrecht 2001) pp. 89–111
Google Scholar
J. Peters, S. Vijayakumar, S. Schaal: Linear Quadratic Regulation as Benchmark for Policy Gradient Methods, Technical Report (University of Southern California, Los Angeles 2004)
Google Scholar
R.E. Bellman: Dynamic Programming (Princeton Univ. Press, Princeton 1957)
MATH Google Scholar
R.S. Sutton, D. McAllester, S.P. Singh, Y. Mansour: Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inform. Process. Syst., Vol. 12 (1999) pp. 1057–1063
Google Scholar
T. Jaakkola, M.I. Jordan, S.P. Singh: Convergence of stochastic iterative dynamic programming algorithms, Adv. Neural Inform. Process. Syst., Vol. 6 (1993) pp. 703–710
Google Scholar
J. Rust: Using randomization to break the curse of dimensionality, Econometrica 65(3), 487–516 (1997)
Article MathSciNet MATH Google Scholar
D.E. Kirk: Optimal Control Theory (Prentice-Hall, Englewood Cliffs 1970)
Google Scholar
A. Schwartz: A reinforcement learning method for maximizing undiscounted rewards, Int. Conf. Mach. Learn. (1993)
Google Scholar
C.G. Atkeson, S. Schaal: Robot learning from demonstration, Int. Conf. Mach. Learn. (1997)
Google Scholar
J. Peters, K. Muelling, Y. Altun: Relative entropy policy search, Natl. Conf. Artif. Intell. (2010)
Google Scholar
G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robotics Res. 27(2), 213–228 (2008)
Article Google Scholar
F. Guenter, M. Hersch, S. Calinon, A. Billard: Reinforcement learning for imitating constrained reaching movements, Adv. Robotics 21(13), 1521–1544 (2007)
Article Google Scholar
J.Z. Kolter, A.Y. Ng: Policy search via the signed derivative, Robotics Sci. Syst. V, Seattle (2009)
Google Scholar
A.Y. Ng, H.J. Kim, M.I. Jordan, S. Sastry: Autonomous helicopter flight via reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2004) pp. 799–806
Google Scholar
J.W. Roberts, L. Moret, J. Zhang, R. Tedrake: From motor to interaction learning in robots, Stud. Comput. Intell. 264, 293–309 (2010)
Article MATH Google Scholar
R. Tedrake: Stochastic policy gradient reinforcement learning on a simple 3D biped, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)
Google Scholar
F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal: Learning motion primitive goals for robust manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
Google Scholar
M. Strens, A. Moore: Direct policy search using paired statistical tests, Int. Conf. Mach. Learn. (2001)
Google Scholar
A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Int. Symp. Exp. Robotics (2004)
Google Scholar
T. Geng, B. Porr, F. Wörgötter: Fast biped walking with a reflexive controller and real-time policy searching, Adv. Neural Inform. Process. Syst., Vol. 18 (2006) pp. 427–434
Google Scholar
N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, N. Hagita: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2005)
Google Scholar
M. Sato, Y. Nakamura, S. Ishii: Reinforcement learning for biped locomotion, Int. Conf. Artif. Neural Netw. (2002)
Google Scholar
R.Y. Rubinstein, D.P. Kroese: The Cross Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation (Springer, New York 2004)
Book MATH Google Scholar
D.E. Goldberg: Genetic Algorithms (Addision Wesley, New York 1989)
MATH Google Scholar
J.T. Betts: Practical Methods for Optimal Control Using Nonlinear Programming, Adv. Design Control, Vol. 3 (SIAM, Philadelphia 2001)
MATH Google Scholar
R.J. Williams: Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8, 229–256 (1992)
MATH Google Scholar
P. Dayan, G.E. Hinton: Using expectation-maximization for reinforcement learning, Neural Comput. 9(2), 271–278 (1997)
Article MATH Google Scholar
N. Vlassis, M. Toussaint, G. Kontes, S. Piperidis: Learning model-free robot control by a Monte Carlo EM algorithm, Auton. Robots 27(2), 123–130 (2009)
Article Google Scholar
J. Kober, E. Oztop, J. Peters: Reinforcement learning to adjust robot movements to new situations, Proc. Robotics Sci. Syst. Conf. (2010)
Google Scholar
E.A. Theodorou, J. Buchli, S. Schaal: Reinforcement learning of motor skills in high dimensions: A path integral approach, IEEE Int. Conf. Robotics Autom. (2010)
Google Scholar
J.A. Bagnell, A.Y. Ng, S. Kakade, J. Schneider: Policy search by dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 831–838
Google Scholar
T. Kollar, N. Roy: Trajectory optimization using reinforcement learning for map exploration, Int. J. Robotics Res. 27(2), 175–197 (2008)
Article Google Scholar
D. Lizotte, T. Wang, M. Bowling, D. Schuurmans: Automatic gait optimization with Gaussian process regression, Int. Jt. Conf. Artif. Intell. (2007)
Google Scholar
S. Kuindersma, R. Grupen, A.G. Barto: Learning dynamic arm motions for postural recovery, IEEE-RAS Int. Conf. Humanoid Robots (2011)
Google Scholar
M. Tesch, J.G. Schneider, H. Choset: Using response surfaces and expected improvement to optimize snake robot gait parameters, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
Google Scholar
S.-J. Yi, B.-T. Zhang, D. Hong, D.D. Lee: Learning full body push recovery control for small humanoid robots, IEEE Proc. Int. Conf. Robotics Autom. (2011)
Google Scholar
J.A. Boyan, A.W. Moore: Generalization in reinforcement learning: Safely approximating the value function, Adv. Neural Inform. Process. Syst., Vol. 7 (1995) pp. 369–376
Google Scholar
S. Kakade, J. Langford: Approximately optimal approximate reinforcement learning, Int. Conf. Mach. Learn. (2002)
Google Scholar
E. Greensmith, P.L. Bartlett, J. Baxter: Variance reduction techniques for gradient estimates in reinforcement learning, J. Mach. Learn. Res. 5, 1471–1530 (2004)
MathSciNet MATH Google Scholar
M.T. Rosenstein, A.G. Barto: Reinforcement learning with supervision by a stable controller, Am. Control Conf. (2004)
Google Scholar
J.N. Tsitsiklis, B. Van Roy: An analysis of temporal-difference learning with function approximation, IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet MATH Google Scholar
J.Z. Kolter, A.Y. Ng: Regularization and feature selection in least-squares temporal difference learning, Int. Conf. Mach. Learn. (2009)
Google Scholar
L.C. Baird, H. Klopf: Reinforcement Learning with High-Dimensional Continuous Actions, Technical Report WL-TR-93-1147 (Wright-Patterson Air Force Base, Dayton 1993)
Book Google Scholar
G.D. Konidaris, S. Osentoski, P. Thomas: Value function approximation in reinforcement learning using the Fourier basis, AAAI Conf. Artif. Intell. (2011)
Google Scholar
J. Peters, K. Muelling, J. Kober, D. Nguyen-Tuong, O. Kroemer: Towards motor skill learning for robotics, Int. Symp. Robotics Res. (2010)
Google Scholar
L. Buşoniu, R. Babuška, B. de Schutter, D. Ernst: Reinforcement Learning and Dynamic Programming Using Function Approximators (CRC, Boca Raton 2010)
MATH Google Scholar
A.G. Barto, S. Mahadevan: Recent advances in hierarchical reinforcement learning, Discret. Event Dyn. Syst. 13(4), 341–379 (2003)
Article MathSciNet MATH Google Scholar
S. Hart, R. Grupen: Learning generalizable control programs, IEEE Trans. Auton. Mental Dev. 3(3), 216–231 (2011)
Article Google Scholar
J.G. Schneider: Exploiting model uncertainty estimates for safe dynamic control learning, Adv. Neural Inform. Process. Syst., Vol. 9 (1997) pp. 1047–1053
Google Scholar
J.A. Bagnell: Learning Decisions: Robustness, Uncertainty, and Approximation. Dissertation (Robotics Institute, Carnegie Mellon University, Pittsburgh 2004)
Google Scholar
T.M. Moldovan, P. Abbeel: Safe exploration in markov decision processes, 29th Int. Conf. Mach. Learn. (2012)
Google Scholar
T. Hester, M. Quinlan, P. Stone: RTMBA: A real-time model-based reinforcement learning architecture for robot control, IEEE Int. Conf. Robotics Autom. (2012)
Google Scholar
C.G. Atkeson: Using local trajectory optimizers to speed up global optimization in dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 6 (1994) pp. 663–670
Google Scholar
J. Kober, J. Peters: Policy search for motor primitives in robotics, Mach. Learn. 84(1/2), 171–203 (2010)
MathSciNet MATH Google Scholar
S. Russell: Learning agents for uncertain environments (extended abstract), Conf. Comput. Learn. Theory (1989)
Google Scholar
P. Abbeel, A.Y. Ng: Apprenticeship learning via inverse reinforcement learning, Int. Conf. Mach. Learn. (2004)
Google Scholar
N.D. Ratliff, J.A. Bagnell, M.A. Zinkevich: Maximum margin planning, Int. Conf. Mach. Learn. (2006)
Google Scholar
R.L. Keeney, H. Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, New York 1976)
MATH Google Scholar
N. Ratliff, D. Bradley, J.A. Bagnell, J. Chestnutt: Boosting structured prediction for imitation learning, Adv. Neural Inform. Process. Syst., Vol. 19 (2006) pp. 1153–1160
Google Scholar
D. Silver, J.A. Bagnell, A. Stentz: High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)
Google Scholar
D. Silver, J.A. Bagnell, A. Stentz: Learning from demonstration for autonomous navigation in complex unstructured terrain, Int. J. Robotics Res. 29(12), 1565–1592 (2010)
Article Google Scholar
N. Ratliff, J.A. Bagnell, S. Srinivasa: Imitation learning for locomotion and manipulation, IEEE-RAS Int. Conf. Humanoid Robots (2007)
Google Scholar
J.Z. Kolter, P. Abbeel, A.Y. Ng: Hierarchical apprenticeship learning with application to quadruped locomotion, Adv. Neural Inform. Process. Syst., Vol. 20 (2007) pp. 769–776
Google Scholar
J. Sorg, S.P. Singh, R.L. Lewis: Reward design via online gradient ascent, Adv. Neural Inform. Process. Syst., Vol. 23 (2010) pp. 2190–2198
Google Scholar
M. Zucker, J.A. Bagnell: Reinforcement planning: RL for optimal planners, IEEE Proc. Int. Conf. Robotics Autom. (2012)
Google Scholar
H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge: Real-time learning: A ball on a beam, Int. Jt. Conf. Neural Netw. (1992)
Google Scholar
B. Nemec, M. Zorko, L. Zlajpah: Learning of a ball-in-a-cup playing robot, Int. Workshop Robotics, Alpe-Adria-Danube Region (2010)
Google Scholar
M. Tokic, W. Ertel, J. Fessler: The crawler, a class room demonstrator for reinforcement learning, Int. Fla. Artif. Intell. Res. Soc. Conf. (2009)
Google Scholar
H. Kimura, T. Yamashita, S. Kobayashi: Reinforcement learning of walking behavior for a four-legged robot, IEEE Conf. Decis. Control (2001)
Google Scholar
R.A. Willgoss, J. Iqbal: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing, Aust. Conf. Robotics Autom. (1999)
Google Scholar
L. Paletta, G. Fritz, F. Kintzler, J. Irran, G. Dorffner: Perception and developmental learning of affordances in autonomous robots, Lect. Notes Comput. Sci. 4667, 235–250 (2007)
Article Google Scholar
C. Kwok, D. Fox: Reinforcement learning for sensing strategies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)
Google Scholar
T. Yasuda, K. Ohkura: A reinforcement learning technique with an adaptive action generator for a multi-robot system, Int. Conf. Simul. Adapt. Behav. (2008)
Google Scholar
J.H. Piater, S. Jodogne, R. Detry, D. Kraft, N. Krüger, O. Kroemer, J. Peters: Learning visual representations for perception-action systems, Int. J. Robotics Res. 30(3), 294–307 (2011)
Article MATH Google Scholar
M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda: Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Mach. Learn. 23(2/3), 279–303 (1996)
Article Google Scholar
M. Huber, R.A. Grupen: A feedback control structure for on-line learning tasks, Robotics Auton. Syst. 22(3/4), 303–315 (1997)
Article Google Scholar
P. Fidelman, P. Stone: Learning ball acquisition on a physical robot, Int. Symp. Robotics Autom. (2004)
Google Scholar
V. Soni, S.P. Singh: Reinforcement learning of hierarchical skills on the Sony AIBO robot, Int. Conf. Dev. Learn. (2006)
Google Scholar
B. Nemec, M. Tamošiunaitė, F. Wörgötter, A. Ude: Task adaptation through exploration and action sequencing, IEEE-RAS Int. Conf. Humanoid Robots (2009)
Google Scholar
M.J. Matarić: Reinforcement learning in the multi-robot domain, Auton. Robots 4, 73–83 (1997)
Article Google Scholar
M.J. Matarić: Reward functions for accelerated learning, Int. Conf. Mach. Learn. (ICML) (1994)
Google Scholar
R. Platt, R.A. Grupen, A.H. Fagg: Improving grasp skills using schema structured learning, Int. Conf. Dev. Learn. (2006)
Google Scholar
M. Dorigo, M. Colombetti: Robot Shaping: Developing Situated Agents Through Learning, Technical Report (International Computer Science Institute, Berkeley 1993)
Google Scholar
G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Autonomous skill acquisition on a mobile manipulator, AAAI Conf. Artif. Intell. (2011)
Google Scholar
G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Robot learning from demonstration by constructing skill trees, Int. J. Robotics Res. 31(3), 360–375 (2012)
Article Google Scholar
A. Cocora, K. Kersting, C. Plagemann, W. Burgard, L. de Raedt: Learning relational navigation policies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2006)
Google Scholar
D. Katz, Y. Pyuro, O. Brock: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)
Google Scholar
C.H. An, C.G. Atkeson, J.M. Hollerbach: Model-Based Control of a Robot Manipulator (MIT, Press, Cambridge 1988)
Google Scholar
C. Gaskett, L. Fletcher, A. Zelinsky: Reinforcement learning for a vision based mobile robot, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2000)
Google Scholar
Y. Duan, B. Cui, H. Yang: Robot navigation based on fuzzy RL algorithm, Int. Symp. Neural Netw. (2008)
Google Scholar
H. Benbrahim, J.A. Franklin: Biped dynamic walking using reinforcement learning, Robotics Auton. Syst. 22(3/4), 283–302 (1997)
Article Google Scholar
W.D. Smart, L. Pack Kaelbling: A framework for reinforcement learning on real robots, Natl. Conf. Artif. Intell./Innov. Appl. Artif. Intell. (1989)
Google Scholar
D.C. Bentivegna: Learning from Observation Using Primitives (Georgia Institute of Technology, Atlanta 2004)
Google Scholar
A. Rottmann, C. Plagemann, P. Hilgers, W. Burgard: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2007)
Google Scholar
K. Gräve, J. Stückler, S. Behnke: Learning motion skills from expert demonstrations and own experience using Gaussian process regression, Jt. Int. Symp. Robotics (ISR) Ger. Conf. Robotics (ROBOTIK) (2010)
Google Scholar
O. Kroemer, R. Detry, J. Piater, J. Peters: Active learning using mean shift optimization for robot grasping, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2009)
Google Scholar
O. Kroemer, R. Detry, J. Piater, J. Peters: Combining active learning and reactive control for robot grasping, Robotics Auton. Syst. 58(9), 1105–1116 (2010)
Article Google Scholar
T. Tamei, T. Shibata: Policy gradient learning of cooperative interaction with a robot using user's biological signals, Int. Conf. Neural Inf. Process. (2009)
Google Scholar
A.J. Ijspeert, J. Nakanishi, S. Schaal: Learning attractor landscapes for learning motor primitives, Adv. Neural Inform. Process. Syst., Vol. 15 (2003) pp. 1547–1554
Google Scholar
S. Schaal, P. Mohajerian, A.J. Ijspeert: Dynamics systems vs. optimal control – A unifying view, Prog. Brain Res. 165(1), 425–445 (2007)
Article Google Scholar
H.-I. Lin, C.-C. Lai: Learning collision-free reaching skill from primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)
Google Scholar
J. Kober, B. Mohler, J. Peters: Learning perceptual coupling for motor primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2008)
Google Scholar
S. Bitzer, M. Howard, S. Vijayakumar: Using dimensionality reduction to exploit constraints in reinforcement learning, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (2010)
Google Scholar
J. Buchli, F. Stulp, E. Theodorou, S. Schaal: Learning variable impedance control, Int. J. Robotics Res. 30(7), 820–833 (2011)
Article Google Scholar
P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, S. Schaal: Skill learning and task outcome prediction for manipulation, IEEE Int. Conf. Robotics Autom. (2011)
Google Scholar
M. Kalakrishnan, L. Righetti, P. Pastor, S. Schaal: Learning force control policies for compliant manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)
Google Scholar
D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learning from observation and practice using behavioral primitives: Marble maze, 11th Int. Symp. Robotics Res. (2004)
Google Scholar
F. Kirchner: Q-learning of complex behaviours on a six-legged walking machine, EUROMICRO Workshop Adv. Mobile Robots (1997)
Google Scholar
J. Morimoto, K. Doya: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Auton. Syst. 36(1), 37–51 (2001)
Article MATH Google Scholar
J.-Y. Donnart, J.-A. Meyer: Learning reactive and planning rules in a motivationally autonomous animat, Syst. Man Cybern. B 26(3), 381–395 (1996)
Article Google Scholar
C. Daniel, G. Neumann, J. Peters: Learning concurrent motor skills in versatile solution spaces, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)
Google Scholar
E.C. Whitman, C.G. Atkeson: Control of instantaneously coupled systems applied to humanoid walking, IEEE-RAS Int. Conf. Humanoid Robots (2010)
Google Scholar
X. Huang, J. Weng: Novelty and reinforcement learning in the value system of developmental robots, 2nd Int. Workshop Epigenetic Robotics Model. Cognit. Dev. Robotic Syst. (2002)
Google Scholar
M. Pendrith: Reinforcement learning in situated agents: Some theoretical problems and practical solutions, Eur. Workshop Learn. Robots (1999)
Google Scholar
B. Wang, J.W. Li, H. Liu: A heuristic reinforcement learning for robot approaching objects, IEEE Conf. Robotics Autom. Mechatron. (2006)
Google Scholar
L.P. Kaelbling: Learning in Embedded Systems (Stanford University, Stanford 1990)
Google Scholar
R.S. Sutton: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Int. Conf. Mach. Learn. (1990)
Google Scholar
A.W. Moore, C.G. Atkeson: Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn. 13(1), 103–130 (1993)
Google Scholar
J. Peng, R.J. Williams: Incremental multi-step Q-learning, Mach. Learn. 22(1), 283–290 (1996)
Google Scholar
N. Jakobi, P. Husbands, I. Harvey: Noise and the reality gap: The use of simulation in evolutionary robotics, 3rd Eur. Conf. Artif. Life (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Autonomous Systems Lab, Technical University Darmstadt, Hochschulstrasse 10, 64289, Darmstadt, Germany
Jan Peters
Department of Electrical Systems Engineering, University of Pennsylvania, 460 Levine, 200 South 33rd Street, PA 19104, Philadelphia, USA
Daniel D. Lee
Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, Netherlands
Jens Kober
Corporate Research, Robert Bosch GmbH, Wernerstrasse 51, 70469, Stuttgart, Germany
Duy Nguyen-Tuong
Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, PA 15213, Pittsburgh, USA
J. Andrew Bagnell
Depts. of Computer Science, Neuroscience, and Biomedical Engineering, University of Southern California, 3710 South McClintock Avenue, CA 90089-2905, Los Angeles, USA
Stefan Schaal

Authors

Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
Daniel D. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jens Kober
View author publications
You can also search for this author in PubMed Google Scholar
Duy Nguyen-Tuong
View author publications
You can also search for this author in PubMed Google Scholar
J. Andrew Bagnell
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schaal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jan Peters or Stefan Schaal .

Editor information

Editors and Affiliations

Department of Electrical Engineering, University of Naples Federico II, Via Claudio 21, 80125, Naples, Italy
Bruno Siciliano
Department of Computer Sciences, Artificial Intelligence Laboratory, Stanford University, 450 Serra Mall, CA 94305, Stanford, USA
Oussama Khatib

Video-References

:: Inverted helicopter hovering available from http://handbookofrobotics.org/view-chapter/15/videodetails/352
:: Inverse reinforcement available from http://handbookofrobotics.org/view-chapter/15/videodetails/353
:: Machine learning table tennis available from http://handbookofrobotics.org/view-chapter/15/videodetails/354
:: Learning motor primitives available from http://handbookofrobotics.org/view-chapter/15/videodetails/355

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Peters, J., Lee, D.D., Kober, J., Nguyen-Tuong, D., Bagnell, J.A., Schaal, S. (2016). Robot Learning. In: Siciliano, B., Khatib, O. (eds) Springer Handbook of Robotics. Springer Handbooks. Springer, Cham. https://doi.org/10.1007/978-3-319-32552-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-32552-1_15
Published: 27 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32550-7
Online ISBN: 978-3-319-32552-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Robot Learning

Abstract

Access this chapter

Abbreviations

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Video-References

Video-References

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation