Robot Learning

Part of the Springer Handbooks book series (SHB)


Machine learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors; conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in robot learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this chapter, we attempt to strengthen the links between the two research communities by providing a survey of work in robot learning for learning control and behavior generation in robots. We highlight both key challenges in robot learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our chapter lies on model learning for control and robot reinforcement learning. We demonstrate how machine learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.


Reinforcement Learning Forward Model Inverse Model Reward Function Inverse Dynamic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

cerebellar model articulation controller


differential dynamic programming


Markov decision process


model predictive control


model reference adaptive control


operational-space control




reward increment = nonnegative factor × offset reinforcement × characteristic eligibility


reinforcement learning


state action-reward-state-action


singular value decomposition


support vector regression


zero moment point


  1. 15.1
    S. Schaal: The new robotics – Towards human-centered machines, HFSP J. Front. Interdiscip. Res, Life Sci. 1(2), 115–126 (2007)Google Scholar
  2. 15.2
    B.D. Ziebart, A. Maas, J.A. Bagnell, A.K. Dey: Maximum entropy inverse reinforcement learning, AAAI Conf. Artif. Intell. (2008)Google Scholar
  3. 15.3
    S. Thrun, W. Burgard, D. Fox: Probabilistic Robotics (MIT, Cambridge 2005)zbMATHGoogle Scholar
  4. 15.4
    B. Apolloni, A. Ghosh, F. Alpaslan, L.C. Jain, S. Patnaik (Eds.): Machine Learning and Robot Perception, Stud. Comput. Intell., Vol. 7 (Springer, Berlin, Heidelberg 2005)zbMATHGoogle Scholar
  5. 15.5
    O. Jenkins, R. Bodenheimer, R. Peters: Manipulation manifolds: Explorations into uncovering manifolds in sensory-motor spaces, Int. Conf. Dev. Learn. (2006)Google Scholar
  6. 15.6
    M. Toussaint: Machine learning and robotics, Tutor. Conf. Mach. Learn. (2011)Google Scholar
  7. 15.7
    D.P. Bertsekas: Dynamic Programming and Optimal Control (Athena Scientific, Nashua 1995)zbMATHGoogle Scholar
  8. 15.8
    R.E. Kalman: When is a linear control system optimal?, J. Basic Eng. 86(1), 51–60 (1964)CrossRefGoogle Scholar
  9. 15.9
    D. Nguyen-Tuong, J. Peters: Model learning in robotics: A survey, Cogn. Process. 12(4), 319–340 (2011)CrossRefGoogle Scholar
  10. 15.10
    J. Kober, D. Bagnell, J. Peters: Reinforcement learning in robotics: A survey, Int. J. Robotics Res. 32(11), 1238–1274 (2013)CrossRefGoogle Scholar
  11. 15.11
    J.H. Connell, S. Mahadevan: Robot Learning (Kluwer Academic, Dordrecht 1993)zbMATHCrossRefGoogle Scholar
  12. 15.12
    J. Ham, Y. Lin, D.D. Lee: Learning nonlinear appearance manifolds for robot localization, Int. Conf. Intell. Robots Syst. (2005)Google Scholar
  13. 15.13
    R.S. Sutton, A.G. Barto: Reinforcement Learning (MIT, Cambridge 1998)Google Scholar
  14. 15.14
    D. Nguyen-Tuong, J. Peters: Model learning with local Gaussian process regression, Adv. Robotics 23(15), 2015–2034 (2009)CrossRefGoogle Scholar
  15. 15.15
    J. Nakanishi, R. Cory, M. Mistry, J. Peters, S. Schaal: Operational space control: A theoretical and emprical comparison, Int. J. Robotics Res. 27(6), 737–757 (2008)CrossRefGoogle Scholar
  16. 15.16
    F.R. Reinhart, J.J. Steil: Attractor-based computation with reservoirs for online learning of inverse kinematics, Proc. Eur. Symp. Artif. Neural Netw. (2009)Google Scholar
  17. 15.17
    J. Ting, M. Kalakrishnan, S. Vijayakumar, S. Schaal: Bayesian kernel shaping for learning control, Adv. Neural Inform. Process. Syst., Vol. 21 (2008) pp. 1673–1680Google Scholar
  18. 15.18
    J. Steffen, S. Klanke, S. Vijayakumar, H.J. Ritter: Realising dextrous manipulation with structured manifolds using unsupervised kernel regression with structural hints, ICRA 2009 Workshop: Approaches Sens. Learn. Humanoid Robots, Kobe (2009)Google Scholar
  19. 15.19
    S. Klanke, D. Lebedev, R. Haschke, J.J. Steil, H. Ritter: Dynamic path planning for a 7-dof robot arm, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2006)Google Scholar
  20. 15.20
    A. Angelova, L. Matthies, D. Helmick, P. Perona: Slip prediction using visual information, Proc. Robotics Sci. Syst., Philadelphia (2006)Google Scholar
  21. 15.21
    M. Kalakrishnan, J. Buchli, P. Pastor, S. Schaal: Learning locomotion over rough terrain using terrain templates, IEEE Int. Conf. Intell. Robots Syst. (2009)Google Scholar
  22. 15.22
    N. Hawes, J.L. Wyatt, M. Sridharan, M. Kopicki, S. Hongeng, I. Calvert, A. Sloman, G.-J. Kruijff, H. Jacobsson, M. Brenner, D. Skočaj, A. Vrečko, N. Majer, M. Zillich: The playmate system, Cognit. Syst. 8, 367–393 (2010)CrossRefGoogle Scholar
  23. 15.23
    D. Skočaj, M. Kristan, A. Vrečko, A. Leonardis, M. Fritz, M. Stark, B. Schiele, S. Hongeng, J.L. Wyatt: Multi-modal learning, Cogn. Syst. 8, 265–309 (2010)CrossRefGoogle Scholar
  24. 15.24
    O.J. Smith: A controller to overcome dead-time, Instrum. Soc. Am. J. 6, 28–33 (1959)Google Scholar
  25. 15.25
    K.S. Narendra, A.M. Annaswamy: Stable Adaptive Systems (Prentice Hall, New Jersey 1989)zbMATHGoogle Scholar
  26. 15.26
    S. Nicosia, P. Tomei: Model reference adaptive control algorithms for industrial robots, Automatica 20, 635–644 (1984)zbMATHCrossRefGoogle Scholar
  27. 15.27
    J.M. Maciejowski: Predictive Control with Constraints (Prentice Hall, New Jersey 2002)zbMATHGoogle Scholar
  28. 15.28
    R.S. Sutton: Dyna, an integrated architecture for learning, planning, and reacting, SIGART Bulletin 2(4), 160–163 (1991)CrossRefGoogle Scholar
  29. 15.29
    C.G. Atkeson, J. Morimoto: Nonparametric representation of policies and value functions: A trajectory-based approach, Adv. Neural Inform. Process. Syst., Vol. 15 (2002)Google Scholar
  30. 15.30
    A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Proc. 11th Int. Symp. Exp. Robotics (2004)Google Scholar
  31. 15.31
    C.E. Rasmussen, M. Kuss: Gaussian processes in reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 751–758Google Scholar
  32. 15.32
    A. Rottmann, W. Burgard: Adaptive autonomous control using online value iteration with Gaussian processes, Proc. IEEE Int. Conf. Robotics Autom. (2009)Google Scholar
  33. 15.33
    J.-J.E. Slotine, W. Li: Applied Nonlinear Control (Prentice Hall, Upper Saddle River 1991)zbMATHGoogle Scholar
  34. 15.34
    A. De Luca, P. Lucibello: A general algorithm for dynamic feedback linearization of robots with elastic joints, Proc. IEEE Int. Conf. Robotics Autom. (1998)Google Scholar
  35. 15.35
    I. Jordan, D. Rumelhart: Forward models: Supervised learning with a distal teacher, Cognit. Sci. 16, 307–354 (1992)CrossRefGoogle Scholar
  36. 15.36
    D.M. Wolpert, M. Kawato: Multiple paired forward and inverse models for motor control, Neural Netw. 11, 1317–1329 (1998)CrossRefGoogle Scholar
  37. 15.37
    M. Kawato: Internal models for motor control and trajectory planning, Curr. Opin. Neurobiol. 9(6), 718–727 (1999)CrossRefGoogle Scholar
  38. 15.38
    D.M. Wolpert, R.C. Miall, M. Kawato: Internal models in the cerebellum, Trends Cogn. Sci. 2(9), 338–347 (1998)CrossRefGoogle Scholar
  39. 15.39
    N. Bhushan, R. Shadmehr: Evidence for a forward dynamics model in human adaptive motor control, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 3–9Google Scholar
  40. 15.40
    K. Narendra, J. Balakrishnan, M. Ciliz: Adaptation and learning using multiple models, switching and tuning, IEEE Control Syst, Mag. 15(3), 37–51 (1995)Google Scholar
  41. 15.41
    K. Narendra, J. Balakrishnan: Adaptive control using multiple models, IEEE Trans. Autom. Control 42(2), 171–187 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 15.42
    M. Haruno, D.M. Wolpert, M. Kawato: Mosaic model for sensorimotor learning and control, Neural Comput. 13(10), 2201–2220 (2001)zbMATHCrossRefGoogle Scholar
  43. 15.43
    J. Peters, S. Schaal: Learning to control in operational space, Int. J. Robotics Res. 27(2), 197–212 (2008)CrossRefGoogle Scholar
  44. 15.44
    H. Akaike: Autoregressive model fitting for control, Ann. Inst. Stat. Math. 23, 163–180 (1970)MathSciNetzbMATHCrossRefGoogle Scholar
  45. 15.45
    R.M.C. De Keyser, A.R.V. Cauwenberghe: A self-tuning multistep predictor application, Automatica 17, 167–174 (1980)zbMATHCrossRefGoogle Scholar
  46. 15.46
    S.S. Billings, S. Chen, G. Korenberg: Identification of mimo nonlinear systems using a forward-regression orthogonal estimator, Int. J. Control 49, 2157–2189 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  47. 15.47
    E. Mosca, G. Zappa, J.M. Lemos: Robustness of multipredictor adaptive regulators: MUSMAR, Automatica 25, 521–529 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  48. 15.48
    J. Kocijan, R. Murray-Smith, C. Rasmussen, A. Girard: Gaussian process model based predictive control, Proc. Am. Control Conf. (2004)Google Scholar
  49. 15.49
    A. Girard, C.E. Rasmussen, J.Q. Candela, R.M. Smith: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting, Adv. Neural Inform. Process. Syst., Vol. 15 (2002) pp. 545–552Google Scholar
  50. 15.50
    C.G. Atkeson, A. Moore, S. Stefan: Locally weighted learning for control, AI Review 11, 75–113 (1997)Google Scholar
  51. 15.51
    L. Ljung: System Identification – Theory for the User (Prentice-Hall, New Jersey 2004)zbMATHGoogle Scholar
  52. 15.52
    S. Haykin: Neural Networks: A Comprehensive Foundation (Prentice Hall, New Jersey 1999)zbMATHGoogle Scholar
  53. 15.53
    J.J. Steil: Backpropagation-decorrelation: Online recurrent learning with O(N) complexity, Proc. Int. Jt. Conf. Neural Netw. (2004)Google Scholar
  54. 15.54
    C.E. Rasmussen, C.K. Williams: Gaussian Processes for Machine Learning (MIT, Cambridge 2006)zbMATHGoogle Scholar
  55. 15.55
    B. Schölkopf, A. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond (MIT, Cambridge 2002)Google Scholar
  56. 15.56
    K.J. Aström, B. Wittenmark: Adaptive Control (Addison Wesley, Boston 1995)Google Scholar
  57. 15.57
    F.J. Coito, J.M. Lemos: A long-range adaptive controller for robot manipulators, Int. J. Robotics Res. 10, 684–707 (1991)CrossRefGoogle Scholar
  58. 15.58
    P. Vempaty, K. Cheok, R. Loh: Model reference adaptive control for actuators of a biped robot locomotion, Proc. World Congr. Eng. Comput. Sci. (2009)Google Scholar
  59. 15.59
    J.R. Layne, K.M. Passino: Fuzzy model reference learning control, J. Intell. Fuzzy Syst. 4, 33–47 (1996)zbMATHGoogle Scholar
  60. 15.60
    J. Nakanishi, J.A. Farrell, S. Schaal: Composite adaptive control with locally weighted statistical learning, Neural Netw. 18(1), 71–90 (2005)zbMATHCrossRefGoogle Scholar
  61. 15.61
    J.J. Craig: Introduction to Robotics: Mechanics and Control (Prentice Hall, Upper Saddle River 2004)Google Scholar
  62. 15.62
    M.W. Spong, S. Hutchinson, M. Vidyasagar: Robot Dynamics and Control (Wiley, New York 2006)Google Scholar
  63. 15.63
    S. Schaal, C.G. Atkeson, S. Vijayakumar: Scalable techniques from nonparametric statistics for real-time robot learning, Appl. Intell. 17(1), 49–60 (2002)zbMATHCrossRefGoogle Scholar
  64. 15.64
    H. Cao, Y. Yin, D. Du, L. Lin, W. Gu, Z. Yang: Neural network inverse dynamic online learning control on physical exoskeleton, 13th Int. Conf. Neural Inform. Process. (2006)Google Scholar
  65. 15.65
    C.G. Atkeson, C.H. An, J.M. Hollerbach: Estimation of inertial parameters of manipulator loads and links, Int. J. Robotics Res. 5(3), 101–119 (1986)CrossRefGoogle Scholar
  66. 15.66
    E. Burdet, B. Sprenger, A. Codourey: Experiments in nonlinear adaptive control, Int. Conf. Robotics Autom. 1, 537–542 (1997)CrossRefGoogle Scholar
  67. 15.67
    E. Burdet, A. Codourey: Evaluation of parametric and nonparametric nonlinear adaptive controllers, Robotica 16(1), 59–73 (1998)CrossRefGoogle Scholar
  68. 15.68
    K.S. Narendra, A.M. Annaswamy: Persistent excitation in adaptive systems, Int. J. Control 45, 127–160 (1987)MathSciNetzbMATHCrossRefGoogle Scholar
  69. 15.69
    H.D. Patino, R. Carelli, B.R. Kuchen: Neural networks for advanced control of robot manipulators, IEEE Trans. Neural Netw. 13(2), 343–354 (2002)CrossRefGoogle Scholar
  70. 15.70
    D. Nguyen-Tuong, J. Peters: Incremental sparsification for real-time online model learning, Neurocomputing 74(11), 1859–1867 (2011)CrossRefGoogle Scholar
  71. 15.71
    D. Nguyen-Tuong, J. Peters: Using model knowledge for learning inverse dynamics, Proc. IEEE Int. Conf. Robotics Autom. (2010)Google Scholar
  72. 15.72
    S.S. Ge, T.H. Lee, E.G. Tan: Adaptive neural network control of flexible joint robots based on feedback linearization, Int. J. Syst. Sci. 29(6), 623–635 (1998)CrossRefGoogle Scholar
  73. 15.73
    C.M. Chow, A.G. Kuznetsov, D.W. Clarke: Successive one-step-ahead predictions in multiple model predictive control, Int. J. Control 29, 971–979 (1998)zbMATHGoogle Scholar
  74. 15.74
    M. Kawato: Feedback error learning neural network for supervised motor learning. In: Advanced Neural Computers, ed. by R. Eckmiller (Elsevier, North-Holland, Amsterdam 1990) pp. 365–372Google Scholar
  75. 15.75
    J. Nakanishi, S. Schaal: Feedback error learning and nonlinear adaptive control, Neural Netw. 17(10), 1453–1465 (2004)zbMATHCrossRefGoogle Scholar
  76. 15.76
    T. Shibata, C. Schaal: Biomimetic gaze stabilization based on feedback-error learning with nonparametric regression networks, Neural Netw. 14(2), 201–216 (2001)CrossRefGoogle Scholar
  77. 15.77
    H. Miyamoto, M. Kawato, T. Setoyama, R. Suzuki: Feedback-error-learning neural network for trajectory control of a robotic manipulator, Neural Netw. 1(3), 251–265 (1988)CrossRefGoogle Scholar
  78. 15.78
    H. Gomi, M. Kawato: Recognition of manipulated objects by motor learning with modular architecture networks, Neural Netw. 6(4), 485–497 (1993)CrossRefGoogle Scholar
  79. 15.79
    A. D'Souza, S. Vijayakumar, S. Schaal: Learning inverse kinematics, IEEE Int. Conf. Intell. Robots Syst. (2001)Google Scholar
  80. 15.80
    S. Vijayakumar, S. Schaal: Locally weighted projection regression: An O(N) algorithm for incremental real time learning in high dimensional space, Proc. 16th Int. Conf. Mach. Learn. (2000)Google Scholar
  81. 15.81
    M. Toussaint, S. Vijayakumar: Learning discontinuities with products-of-sigmoids for switching between local models, Proc. 22nd Int. Conf. Mach. Learn. (2005)Google Scholar
  82. 15.82
    J. Tenenbaum, V. de Silva, J. Langford: A global geometric framework for nonlinear dimensionality reduction, Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  83. 15.83
    S. Roweis, L. Saul: Nonlinear dimensionality reduction by locally linear embedding, Science 290, 2323 (2000)CrossRefGoogle Scholar
  84. 15.84
    H. Hoffman, S. Schaal, S. Vijayakumar: Local dimensionality reduction for non-parametric regression, Neural Process. Lett. 29(2), 109–131 (2009)CrossRefGoogle Scholar
  85. 15.85
    S. Thrun, T. Mitchell: Lifelong robot learning, Robotics Auton. Syst. 15, 25–46 (1995)CrossRefGoogle Scholar
  86. 15.86
    Y. Engel, S. Mannor, R. Meir: Sparse online greedy support vector regression, Eur. Conf. Mach. Learn. (2002)Google Scholar
  87. 15.87
    A.J. Smola, B. Schölkopf: A tutorial on support vector regression, Stat. Comput. 14(3), 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  88. 15.88
    C.E. Rasmussen: Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression (University of Toronto, Toronto 1996)Google Scholar
  89. 15.89
    L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines (MIT, Cambridge 2007)Google Scholar
  90. 15.90
    J.Q. Candela, C.E. Rasmussen: A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6, 1939–1959 (2005)MathSciNetzbMATHGoogle Scholar
  91. 15.91
    R. Genov, S. Chakrabartty, G. Cauwenberghs: Silicon support vector machine with online learning, Int. J. Pattern Recognit. Articial Intell. 17, 385–404 (2003)CrossRefGoogle Scholar
  92. 15.92
    S. Vijayakumar, A. D'Souza, S. Schaal: Incremental online learning in high dimensions, Neural Comput. 12(11), 2602–2634 (2005)MathSciNetCrossRefGoogle Scholar
  93. 15.93
    B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior knowledge in support vector kernel, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 640–646Google Scholar
  94. 15.94
    E. Krupka, N. Tishby: Incorporating prior knowledge on features into learning, Int. Conf. Artif. Intell. Stat. (San Juan, Puerto Rico 2007)Google Scholar
  95. 15.95
    A. Smola, T. Friess, B. Schoelkopf: Semiparametric support vector and linear programming machines, Adv. Neural Inform. Process. Syst., Vol. 11 (1999) pp. 585–591Google Scholar
  96. 15.96
    B.J. Kröse, N. Vlassis, R. Bunschoten, Y. Motomura: A probabilistic model for appearance-based robot localization, Image Vis. Comput. 19, 381–391 (2001)CrossRefGoogle Scholar
  97. 15.97
    M.K. Titsias, N.D. Lawrence: Bayesian Gaussian process latent variable model, Proc. 13th Int. Conf. Artif. Intell. Stat. (2010)Google Scholar
  98. 15.98
    R. Jacobs, M. Jordan, S. Nowlan, G.E. Hinton: Adaptive mixtures of local experts, Neural Comput. 3, 79–87 (1991)CrossRefGoogle Scholar
  99. 15.99
    S. Calinon, F. D'halluin, E. Sauser, D. Caldwell, A. Billard: A probabilistic approach based on dynamical systems to learn and reproduce gestures by imitation, IEEE Robotics Autom. Mag. 17, 44–54 (2010)CrossRefGoogle Scholar
  100. 15.100
    V. Treps: A bayesian committee machine, Neural Comput. 12(11), 2719–2741 (2000)CrossRefGoogle Scholar
  101. 15.101
    L. Csato, M. Opper: Sparse online Gaussian processes, Neural Comput. 14(3), 641–668 (2002)zbMATHCrossRefGoogle Scholar
  102. 15.102
    D.H. Grollman, O.C. Jenkins: Sparse incremental learning for interactive robot control policy estimation, IEEE Int. Conf. Robotics Autom., Pasadena (2008)Google Scholar
  103. 15.103
    M. Seeger: Gaussian processes for machine learning, Int. J. Neural Syst. 14(2), 69–106 (2004)CrossRefGoogle Scholar
  104. 15.104
    C. Plagemann, S. Mischke, S. Prentice, K. Kersting, N. Roy, W. Burgard: Learning predictive terrain models for legged robot locomotion, Proc. IEEE Int. Conf. Intell. Robots Syst. (2008)Google Scholar
  105. 15.105
    J. Ko, D. Fox: GP-bayesfilters: Bayesian filtering using Gaussian process prediction and observation models, Auton. Robots 27(1), 75–90 (2009)CrossRefGoogle Scholar
  106. 15.106
    J.P. Ferreira, M. Crisostomo, A.P. Coimbra, B. Ribeiro: Simulation control of a biped robot with support vector regression, IEEE Int. Symp. Intell. Signal Process. (2007)Google Scholar
  107. 15.107
    R. Pelossof, A. Miller, P. Allen, T. Jebara: An SVM learning approach to robotic grasping, IEEE Int. Conf. Robotics Autom. (2004)Google Scholar
  108. 15.108
    J. Ma, J. Theiler, S. Perkins: Accurate on-line support vector regression, Neural Comput. 15, 2683–2703 (2005)zbMATHCrossRefGoogle Scholar
  109. 15.109
    Y. Choi, S.Y. Cheong, N. Schweighofer: Local online support vector regression for learning control, Proc. IEEE Int. Symp. Comput. Intell. Robotics Autom. (2007)Google Scholar
  110. 15.110
    J.-A. Ting, A. D'Souza, S. Schaal: Bayesian robot system identification with input and output noise, Neural Netw. 24(1), 99–108 (2011)zbMATHCrossRefGoogle Scholar
  111. 15.111
    S. Nowlan, G.E. Hinton: Evaluation of adaptive mixtures of competing experts, Adv. Neural Inform. Process. Syst., Vol. 3 (1991) pp. 774–780Google Scholar
  112. 15.112
    V. Treps: Mixtures of Gaussian processes, Adv. Neural Inform. Process. Syst., Vol. 13 (2001) pp. 654–660Google Scholar
  113. 15.113
    C.E. Rasmussen, Z. Ghahramani: Infinite mixtures of Gaussian process experts, Adv. Neural Inform. Process. Syst., Vol. 14 (2002) pp. 881–888Google Scholar
  114. 15.114
    T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning (Springer, New York, 2001)zbMATHCrossRefGoogle Scholar
  115. 15.115
    W.K. Haerdle, M. Mueller, S. Sperlich, A. Werwatz: Nonparametric and Semiparametric Models (Springer, New York 2004)CrossRefGoogle Scholar
  116. 15.116
    D.J. MacKay: A practical Bayesian framework for back-propagation networks, Computation 4(3), 448–472 (1992)Google Scholar
  117. 15.117
    R.M. Neal: Bayesian Learning for Neural Networks, Lecture Notes in Statistics, Vol. 118 (Springer, New York 1996)zbMATHGoogle Scholar
  118. 15.118
    B. Schölkopf, A.J. Smola, R. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)CrossRefGoogle Scholar
  119. 15.119
    C. Plagemann, K. Kersting, P. Pfaff, W. Burgard: Heteroscedastic Gaussian process regression for modeling range sensors in mobile robotics, Snowbird Learn. Workshop (2007)Google Scholar
  120. 15.120
    W.S. Cleveland, C.L. Loader: Smoothing by local regression: Principles and methods. In: Statistical Theory and Computational Aspects of Smoothing, ed. by W. Härdle, M.G. Schimele (Physica, Heidelberg 1996)Google Scholar
  121. 15.121
    J. Fan, I. Gijbels: Local Polynomial Modelling and Its Applications (Chapman Hall, New York 1996)zbMATHGoogle Scholar
  122. 15.122
    J. Fan, I. Gijbels: Data driven bandwidth selection in local polynomial fitting, J. R. Stat. Soc. 57(2), 371–394 (1995)zbMATHGoogle Scholar
  123. 15.123
    A. Moore, M.S. Lee: Efficient algorithms for minimizing cross validation error, Proc. 11th Int. Conf. Mach. Learn. (1994)Google Scholar
  124. 15.124
    A. Moore: Fast, robust adaptive control by learning only forward models, Adv. Neural Inform. Process. Syst., Vol. 4 (1992) pp. 571–578Google Scholar
  125. 15.125
    C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning for control, Artif. Intell. Rev. 11, 75–113 (1997)CrossRefGoogle Scholar
  126. 15.126
    G. Tevatia, S. Schaal: Efficient Inverse Kinematics Algorithms for High-Dimensional Movement Systems (University of Southern California, Los Angeles 2008)Google Scholar
  127. 15.127
    C.G. Atkeson, A.W. Moore, S. Schaal: Locally weighted learning, Artif. Intell. Rev. 11(1–5), 11–73 (1997)CrossRefGoogle Scholar
  128. 15.128
    N.U. Edakunni, S. Schaal, S. Vijayakumar: Kernel carpentry for online regression using randomly varying coefficient model, Proc. 20th Int. Jt. Conf. Artif. Intell. (2007)Google Scholar
  129. 15.129
    D.H. Jacobson, D.Q. Mayne: Differential Dynamic Programming (American Elsevier, New York 1973)zbMATHGoogle Scholar
  130. 15.130
    C.G. Atkeson, S. Schaal: Robot learning from demonstration, Proc. 14th Int. Conf. Mach. Learn. (1997)Google Scholar
  131. 15.131
    J. Morimoto, G. Zeglin, C.G. Atkeson: Minimax differential dynamic programming: Application to a biped walking robot, Proc. 2009 IEEE Int. Conf. Intell. Robots Syst. (2003)Google Scholar
  132. 15.132
    P. Abbeel, A. Coates, M. Quigley, A.Y. Ng: An application of reinforcement learning to aerobatic helicopter flight, Adv. Neural Inform. Process. Syst., Vol. 19 (2007) pp. 1–8Google Scholar
  133. 15.133
    P.W. Glynn: Likelihood ratio gradient estimation: An overview, Proc. Winter Simul. Conf. (1987)Google Scholar
  134. 15.134
    A.Y. Ng, M. Jordan: Pegasus: A policy search method for large MDPs and POMDPs, Proc. 16th Conf. Uncertain. Artif. Intell. (2000)Google Scholar
  135. 15.135
    B.M. Akesson, H.T. Toivonen: A neural network model predictive controller, J. Process Control 16(9), 937–946 (2006)CrossRefGoogle Scholar
  136. 15.136
    D. Gu, H. Hu: Predictive control for a car-like mobile robot, Robotics Auton. Syst. 39, 73–86 (2002)CrossRefGoogle Scholar
  137. 15.137
    E.A. Wan, A.A. Bogdanov: Model predictive neural control with applications to a 6 DOF helicopter model, Proc. Am. Control Conf. (2001)Google Scholar
  138. 15.138
    O. Khatib: A unified approach for motion and force control of robot manipulators: The operational space formulation, J. Robotics Autom. 3(1), 43–53 (1987)CrossRefGoogle Scholar
  139. 15.139
    J. Peters, M. Mistry, F.E. Udwadia, J. Nakanishi, S. Schaal: A unifying methodology for robot control with redundant dofs, Auton. Robots 24(1), 1–12 (2008)CrossRefGoogle Scholar
  140. 15.140
    C. Salaun, V. Padois, O. Sigaud: Control of redundant robots using learned models: An operational space control approach, Proc. IEEE Int. Conf. Intell. Robots Syst. (2009)Google Scholar
  141. 15.141
    F.R. Reinhart, J.J. Steil: Recurrent neural associative learning of forward and inverse kinematics for movement generation of the redundant PA-10 robot, Symp. Learn. Adapt. Behav. Robotics Syst. (2008)Google Scholar
  142. 15.142
    J.Q. Candela, C.E. Rasmussen, C.K. Williams: Large Scale Kernel Machines (MIT, Cambridge 2007)Google Scholar
  143. 15.143
    S. Ben-David, R. Schuller: Exploiting task relatedness for multiple task learning, Proc. Conf. Learn. Theory (2003)Google Scholar
  144. 15.144
    I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun: Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  145. 15.145
    O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT, Cambridge 2006)CrossRefGoogle Scholar
  146. 15.146
    J.D. Lafferty, A. McCallum, F.C.N. Pereira: Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th Int. Conf. Mach. Learn. (2001)Google Scholar
  147. 15.147
    K. Muelling, J. Kober, O. Kroemer, J. Peters: Learning to select and generalize striking movements in robot table tennis, Int. J. Robotics Res. 32(3), 263–279 (2012)CrossRefGoogle Scholar
  148. 15.148
    S. Mahadevan, J. Connell: Automatic programming of behavior-based robots using reinforcement learning, Artif. Intell. 55(2/3), 311–365 (1992)CrossRefGoogle Scholar
  149. 15.149
    V. Gullapalli, J.A. Franklin, H. Benbrahim: Acquiring robot skills via reinforcement learning, IEEE Control Syst. Mag. 14(1), 13–24 (1994)CrossRefGoogle Scholar
  150. 15.150
    J.A. Bagnell, J.C. Schneider: Autonomous helicopter control using reinforcement learning policy search methods, IEEE Int. Conf. Robotics Autom. (2001)Google Scholar
  151. 15.151
    S. Schaal: Learning from demonstration, Adv. Neural Inform. Process. Syst., Vol. 9 (1996) pp. 1040–1046Google Scholar
  152. 15.152
    W. B. Powell: AI, OR and Control Theory: A Rosetta Stone for Stochastic Optimization, Tech. Rep. (Princeton University, Princeton 2012)Google Scholar
  153. 15.153
    C.G. Atkeson: Nonparametric model-based reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 10 (1998) pp. 1008–1014Google Scholar
  154. 15.154
    A. Coates, P. Abbeel, A.Y. Ng: Apprenticeship learning for helicopter control, Communication ACM 52(7), 97–105 (2009)CrossRefGoogle Scholar
  155. 15.155
    R.S. Sutton, A.G. Barto, R.J. Williams: Reinforcement learning is direct adaptive optimal control, Am. Control Conf. (1991)Google Scholar
  156. 15.156
    A.D. Laud: Theory and Application of Reward Shaping in Reinforcement Learning (University of Illinois, Urbana-Champaign 2004)Google Scholar
  157. 15.157
    M.P. Deisenrot, C.E. Rasmussen: PILCO: A model-based and data-efficient approach to policy search, 28th Int. Conf. Mach. Learn. (2011)Google Scholar
  158. 15.158
    H. Miyamoto, S. Schaal, F. Gandolfo, H. Gomi, Y. Koike, R. Osu, E. Nakano, Y. Wada, M. Kawato: A Kendama learning robot based on bidirectional theory, Neural Netw. 9(8), 1281–1302 (1996)CrossRefGoogle Scholar
  159. 15.159
    N. Kohl, P. Stone: Policy gradient reinforcement learning for fast quadrupedal locomotion, IEEE Int. Conf. Robotics Autom. (2004)Google Scholar
  160. 15.160
    R. Tedrake, T.W. Zhang, H.S. Seung: Learning to walk in 20 minutes, Yale Workshop Adapt. Learn. Syst. (2005)Google Scholar
  161. 15.161
    J. Peters, S. Schaal: Reinforcement learning of motor skills with policy gradients, Neural Netw. 21(4), 682–697 (2008)CrossRefGoogle Scholar
  162. 15.162
    J. Peters, S. Schaal: Natural actor-critic, Neurocomputing 71(7–9), 1180–1190 (2008)CrossRefGoogle Scholar
  163. 15.163
    J. Kober, J. Peters: Policy search for motor primitives in robotics, Adv. Neural Inform. Process. Syst., Vol. 21 (2009) pp. 849–856Google Scholar
  164. 15.164
    M.P. Deisenroth, C.E. Rasmussen, D. Fox: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems VII, ed. by H. Durrand-Whyte, N. Roy, P. Abbeel (MIT, Cambridge 2011)Google Scholar
  165. 15.165
    L.P. Kaelbling, M.L. Littman, A.W. Moore: Reinforcement learning: A survey, J. Artif. Intell. Res. 4, 237–285 (1996)Google Scholar
  166. 15.166
    M.E. Lewis, M.L. Puterman: The Handbook of Markov Decision Processes: Methods and Applications (Kluwer, Dordrecht 2001) pp. 89–111Google Scholar
  167. 15.167
    J. Peters, S. Vijayakumar, S. Schaal: Linear Quadratic Regulation as Benchmark for Policy Gradient Methods, Technical Report (University of Southern California, Los Angeles 2004)Google Scholar
  168. 15.168
    R.E. Bellman: Dynamic Programming (Princeton Univ. Press, Princeton 1957)zbMATHGoogle Scholar
  169. 15.169
    R.S. Sutton, D. McAllester, S.P. Singh, Y. Mansour: Policy gradient methods for reinforcement learning with function approximation, Adv. Neural Inform. Process. Syst., Vol. 12 (1999) pp. 1057–1063Google Scholar
  170. 15.170
    T. Jaakkola, M.I. Jordan, S.P. Singh: Convergence of stochastic iterative dynamic programming algorithms, Adv. Neural Inform. Process. Syst., Vol. 6 (1993) pp. 703–710Google Scholar
  171. 15.171
    J. Rust: Using randomization to break the curse of dimensionality, Econometrica 65(3), 487–516 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  172. 15.172
    D.E. Kirk: Optimal Control Theory (Prentice-Hall, Englewood Cliffs 1970)Google Scholar
  173. 15.173
    A. Schwartz: A reinforcement learning method for maximizing undiscounted rewards, Int. Conf. Mach. Learn. (1993)Google Scholar
  174. 15.174
    C.G. Atkeson, S. Schaal: Robot learning from demonstration, Int. Conf. Mach. Learn. (1997)Google Scholar
  175. 15.175
    J. Peters, K. Muelling, Y. Altun: Relative entropy policy search, Natl. Conf. Artif. Intell. (2010)Google Scholar
  176. 15.176
    G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, G. Cheng: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robotics Res. 27(2), 213–228 (2008)CrossRefGoogle Scholar
  177. 15.177
    F. Guenter, M. Hersch, S. Calinon, A. Billard: Reinforcement learning for imitating constrained reaching movements, Adv. Robotics 21(13), 1521–1544 (2007)Google Scholar
  178. 15.178
    J.Z. Kolter, A.Y. Ng: Policy search via the signed derivative, Robotics Sci. Syst. V, Seattle (2009)Google Scholar
  179. 15.179
    A.Y. Ng, H.J. Kim, M.I. Jordan, S. Sastry: Autonomous helicopter flight via reinforcement learning, Adv. Neural Inform. Process. Syst., Vol. 16 (2004) pp. 799–806Google Scholar
  180. 15.180
    J.W. Roberts, L. Moret, J. Zhang, R. Tedrake: From motor to interaction learning in robots, Stud. Comput. Intell. 264, 293–309 (2010)zbMATHGoogle Scholar
  181. 15.181
    R. Tedrake: Stochastic policy gradient reinforcement learning on a simple 3D biped, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)Google Scholar
  182. 15.182
    F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, S. Schaal: Learning motion primitive goals for robust manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)Google Scholar
  183. 15.183
    M. Strens, A. Moore: Direct policy search using paired statistical tests, Int. Conf. Mach. Learn. (2001)Google Scholar
  184. 15.184
    A.Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, E. Liang: Autonomous inverted helicopter flight via reinforcement learning, Int. Symp. Exp. Robotics (2004)Google Scholar
  185. 15.185
    T. Geng, B. Porr, F. Wörgötter: Fast biped walking with a reflexive controller and real-time policy searching, Adv. Neural Inform. Process. Syst., Vol. 18 (2006) pp. 427–434Google Scholar
  186. 15.186
    N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, N. Hagita: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2005)Google Scholar
  187. 15.187
    M. Sato, Y. Nakamura, S. Ishii: Reinforcement learning for biped locomotion, Int. Conf. Artif. Neural Netw. (2002)Google Scholar
  188. 15.188
    R.Y. Rubinstein, D.P. Kroese: The Cross Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation (Springer, New York 2004)zbMATHCrossRefGoogle Scholar
  189. 15.189
    D.E. Goldberg: Genetic Algorithms (Addision Wesley, New York 1989)zbMATHGoogle Scholar
  190. 15.190
    J.T. Betts: Practical Methods for Optimal Control Using Nonlinear Programming, Adv. Design Control, Vol. 3 (SIAM, Philadelphia 2001)zbMATHGoogle Scholar
  191. 15.191
    R.J. Williams: Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn. 8, 229–256 (1992)zbMATHGoogle Scholar
  192. 15.192
    P. Dayan, G.E. Hinton: Using expectation-maximization for reinforcement learning, Neural Comput. 9(2), 271–278 (1997)zbMATHCrossRefGoogle Scholar
  193. 15.193
    N. Vlassis, M. Toussaint, G. Kontes, S. Piperidis: Learning model-free robot control by a Monte Carlo EM algorithm, Auton. Robots 27(2), 123–130 (2009)CrossRefGoogle Scholar
  194. 15.194
    J. Kober, E. Oztop, J. Peters: Reinforcement learning to adjust robot movements to new situations, Proc. Robotics Sci. Syst. Conf. (2010)Google Scholar
  195. 15.195
    E.A. Theodorou, J. Buchli, S. Schaal: Reinforcement learning of motor skills in high dimensions: A path integral approach, IEEE Int. Conf. Robotics Autom. (2010)Google Scholar
  196. 15.196
    J.A. Bagnell, A.Y. Ng, S. Kakade, J. Schneider: Policy search by dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 16 (2003) pp. 831–838Google Scholar
  197. 15.197
    T. Kollar, N. Roy: Trajectory optimization using reinforcement learning for map exploration, Int. J. Robotics Res. 27(2), 175–197 (2008)CrossRefGoogle Scholar
  198. 15.198
    D. Lizotte, T. Wang, M. Bowling, D. Schuurmans: Automatic gait optimization with Gaussian process regression, Int. Jt. Conf. Artif. Intell. (2007)Google Scholar
  199. 15.199
    S. Kuindersma, R. Grupen, A.G. Barto: Learning dynamic arm motions for postural recovery, IEEE-RAS Int. Conf. Humanoid Robots (2011)Google Scholar
  200. 15.200
    M. Tesch, J.G. Schneider, H. Choset: Using response surfaces and expected improvement to optimize snake robot gait parameters, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)Google Scholar
  201. 15.201
    S.-J. Yi, B.-T. Zhang, D. Hong, D.D. Lee: Learning full body push recovery control for small humanoid robots, IEEE Proc. Int. Conf. Robotics Autom. (2011)Google Scholar
  202. 15.202
    J.A. Boyan, A.W. Moore: Generalization in reinforcement learning: Safely approximating the value function, Adv. Neural Inform. Process. Syst., Vol. 7 (1995) pp. 369–376Google Scholar
  203. 15.203
    S. Kakade, J. Langford: Approximately optimal approximate reinforcement learning, Int. Conf. Mach. Learn. (2002)Google Scholar
  204. 15.204
    E. Greensmith, P.L. Bartlett, J. Baxter: Variance reduction techniques for gradient estimates in reinforcement learning, J. Mach. Learn. Res. 5, 1471–1530 (2004)MathSciNetzbMATHGoogle Scholar
  205. 15.205
    M.T. Rosenstein, A.G. Barto: Reinforcement learning with supervision by a stable controller, Am. Control Conf. (2004)Google Scholar
  206. 15.206
    J.N. Tsitsiklis, B. Van Roy: An analysis of temporal-difference learning with function approximation, IEEE Trans. Autom. Control 42(5), 674–690 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  207. 15.207
    J.Z. Kolter, A.Y. Ng: Regularization and feature selection in least-squares temporal difference learning, Int. Conf. Mach. Learn. (2009)Google Scholar
  208. 15.208
    L.C. Baird, H. Klopf: Reinforcement Learning with High-Dimensional Continuous Actions, Technical Report WL-TR-93-1147 (Wright-Patterson Air Force Base, Dayton 1993)Google Scholar
  209. 15.209
    G.D. Konidaris, S. Osentoski, P. Thomas: Value function approximation in reinforcement learning using the Fourier basis, AAAI Conf. Artif. Intell. (2011)Google Scholar
  210. 15.210
    J. Peters, K. Muelling, J. Kober, D. Nguyen-Tuong, O. Kroemer: Towards motor skill learning for robotics, Int. Symp. Robotics Res. (2010)Google Scholar
  211. 15.211
    L. Buşoniu, R. Babuška, B. de Schutter, D. Ernst: Reinforcement Learning and Dynamic Programming Using Function Approximators (CRC, Boca Raton 2010)zbMATHCrossRefGoogle Scholar
  212. 15.212
    A.G. Barto, S. Mahadevan: Recent advances in hierarchical reinforcement learning, Discret. Event Dyn. Syst. 13(4), 341–379 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  213. 15.213
    S. Hart, R. Grupen: Learning generalizable control programs, IEEE Trans. Auton. Mental Dev. 3(3), 216–231 (2011)CrossRefGoogle Scholar
  214. 15.214
    J.G. Schneider: Exploiting model uncertainty estimates for safe dynamic control learning, Adv. Neural Inform. Process. Syst., Vol. 9 (1997) pp. 1047–1053Google Scholar
  215. 15.215
    J.A. Bagnell: Learning Decisions: Robustness, Uncertainty, and Approximation. Dissertation (Robotics Institute, Carnegie Mellon University, Pittsburgh 2004)Google Scholar
  216. 15.216
    T.M. Moldovan, P. Abbeel: Safe exploration in markov decision processes, 29th Int. Conf. Mach. Learn. (2012)Google Scholar
  217. 15.217
    T. Hester, M. Quinlan, P. Stone: RTMBA: A real-time model-based reinforcement learning architecture for robot control, IEEE Int. Conf. Robotics Autom. (2012)Google Scholar
  218. 15.218
    C.G. Atkeson: Using local trajectory optimizers to speed up global optimization in dynamic programming, Adv. Neural Inform. Process. Syst., Vol. 6 (1994) pp. 663–670Google Scholar
  219. 15.219
    J. Kober, J. Peters: Policy search for motor primitives in robotics, Mach. Learn. 84(1/2), 171–203 (2010)MathSciNetzbMATHGoogle Scholar
  220. 15.220
    S. Russell: Learning agents for uncertain environments (extended abstract), Conf. Comput. Learn. Theory (1989)Google Scholar
  221. 15.221
    P. Abbeel, A.Y. Ng: Apprenticeship learning via inverse reinforcement learning, Int. Conf. Mach. Learn. (2004)Google Scholar
  222. 15.222
    N.D. Ratliff, J.A. Bagnell, M.A. Zinkevich: Maximum margin planning, Int. Conf. Mach. Learn. (2006)Google Scholar
  223. 15.223
    R.L. Keeney, H. Raiffa: Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, New York 1976)zbMATHGoogle Scholar
  224. 15.224
    N. Ratliff, D. Bradley, J.A. Bagnell, J. Chestnutt: Boosting structured prediction for imitation learning, Adv. Neural Inform. Process. Syst., Vol. 19 (2006) pp. 1153–1160Google Scholar
  225. 15.225
    D. Silver, J.A. Bagnell, A. Stentz: High performance outdoor navigation from overhead data using imitation learning. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)Google Scholar
  226. 15.226
    D. Silver, J.A. Bagnell, A. Stentz: Learning from demonstration for autonomous navigation in complex unstructured terrain, Int. J. Robotics Res. 29(12), 1565–1592 (2010)CrossRefGoogle Scholar
  227. 15.227
    N. Ratliff, J.A. Bagnell, S. Srinivasa: Imitation learning for locomotion and manipulation, IEEE-RAS Int. Conf. Humanoid Robots (2007)Google Scholar
  228. 15.228
    J.Z. Kolter, P. Abbeel, A.Y. Ng: Hierarchical apprenticeship learning with application to quadruped locomotion, Adv. Neural Inform. Process. Syst., Vol. 20 (2007) pp. 769–776Google Scholar
  229. 15.229
    J. Sorg, S.P. Singh, R.L. Lewis: Reward design via online gradient ascent, Adv. Neural Inform. Process. Syst., Vol. 23 (2010) pp. 2190–2198Google Scholar
  230. 15.230
    M. Zucker, J.A. Bagnell: Reinforcement planning: RL for optimal planners, IEEE Proc. Int. Conf. Robotics Autom. (2012)Google Scholar
  231. 15.231
    H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge: Real-time learning: A ball on a beam, Int. Jt. Conf. Neural Netw. (1992)Google Scholar
  232. 15.232
    B. Nemec, M. Zorko, L. Zlajpah: Learning of a ball-in-a-cup playing robot, Int. Workshop Robotics, Alpe-Adria-Danube Region (2010)Google Scholar
  233. 15.233
    M. Tokic, W. Ertel, J. Fessler: The crawler, a class room demonstrator for reinforcement learning, Int. Fla. Artif. Intell. Res. Soc. Conf. (2009)Google Scholar
  234. 15.234
    H. Kimura, T. Yamashita, S. Kobayashi: Reinforcement learning of walking behavior for a four-legged robot, IEEE Conf. Decis. Control (2001)Google Scholar
  235. 15.235
    R.A. Willgoss, J. Iqbal: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing, Aust. Conf. Robotics Autom. (1999)Google Scholar
  236. 15.236
    L. Paletta, G. Fritz, F. Kintzler, J. Irran, G. Dorffner: Perception and developmental learning of affordances in autonomous robots, Lect. Notes Comput. Sci. 4667, 235–250 (2007)CrossRefGoogle Scholar
  237. 15.237
    C. Kwok, D. Fox: Reinforcement learning for sensing strategies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2004)Google Scholar
  238. 15.238
    T. Yasuda, K. Ohkura: A reinforcement learning technique with an adaptive action generator for a multi-robot system, Int. Conf. Simul. Adapt. Behav. (2008)Google Scholar
  239. 15.239
    J.H. Piater, S. Jodogne, R. Detry, D. Kraft, N. Krüger, O. Kroemer, J. Peters: Learning visual representations for perception-action systems, Int. J. Robotics Res. 30(3), 294–307 (2011)zbMATHCrossRefGoogle Scholar
  240. 15.240
    M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda: Purposive behavior acquisition for a real robot by vision-based reinforcement learning, Mach. Learn. 23(2/3), 279–303 (1996)CrossRefGoogle Scholar
  241. 15.241
    M. Huber, R.A. Grupen: A feedback control structure for on-line learning tasks, Robotics Auton. Syst. 22(3/4), 303–315 (1997)CrossRefGoogle Scholar
  242. 15.242
    P. Fidelman, P. Stone: Learning ball acquisition on a physical robot, Int. Symp. Robotics Autom. (2004)Google Scholar
  243. 15.243
    V. Soni, S.P. Singh: Reinforcement learning of hierarchical skills on the Sony AIBO robot, Int. Conf. Dev. Learn. (2006)Google Scholar
  244. 15.244
    B. Nemec, M. Tamošiunaitė, F. Wörgötter, A. Ude: Task adaptation through exploration and action sequencing, IEEE-RAS Int. Conf. Humanoid Robots (2009)Google Scholar
  245. 15.245
    M.J. Matarić: Reinforcement learning in the multi-robot domain, Auton. Robots 4, 73–83 (1997)CrossRefGoogle Scholar
  246. 15.246
    M.J. Matarić: Reward functions for accelerated learning, Int. Conf. Mach. Learn. (ICML) (1994)Google Scholar
  247. 15.247
    R. Platt, R.A. Grupen, A.H. Fagg: Improving grasp skills using schema structured learning, Int. Conf. Dev. Learn. (2006)Google Scholar
  248. 15.248
    M. Dorigo, M. Colombetti: Robot Shaping: Developing Situated Agents Through Learning, Technical Report (International Computer Science Institute, Berkeley 1993)Google Scholar
  249. 15.249
    G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Autonomous skill acquisition on a mobile manipulator, AAAI Conf. Artif. Intell. (2011)Google Scholar
  250. 15.250
    G.D. Konidaris, S. Kuindersma, R. Grupen, A.G. Barto: Robot learning from demonstration by constructing skill trees, Int. J. Robotics Res. 31(3), 360–375 (2012)CrossRefGoogle Scholar
  251. 15.251
    A. Cocora, K. Kersting, C. Plagemann, W. Burgard, L. de Raedt: Learning relational navigation policies, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2006)Google Scholar
  252. 15.252
    D. Katz, Y. Pyuro, O. Brock: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, Vol. IV, ed. by O. Brock, J. Trinkle, F. Ramos (MIT, Cambridge 2008)Google Scholar
  253. 15.253
    C.H. An, C.G. Atkeson, J.M. Hollerbach: Model-Based Control of a Robot Manipulator (MIT, Press, Cambridge 1988)Google Scholar
  254. 15.254
    C. Gaskett, L. Fletcher, A. Zelinsky: Reinforcement learning for a vision based mobile robot, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2000)Google Scholar
  255. 15.255
    Y. Duan, B. Cui, H. Yang: Robot navigation based on fuzzy RL algorithm, Int. Symp. Neural Netw. (2008)Google Scholar
  256. 15.256
    H. Benbrahim, J.A. Franklin: Biped dynamic walking using reinforcement learning, Robotics Auton. Syst. 22(3/4), 283–302 (1997)CrossRefGoogle Scholar
  257. 15.257
    W.D. Smart, L. Pack Kaelbling: A framework for reinforcement learning on real robots, Natl. Conf. Artif. Intell./Innov. Appl. Artif. Intell. (1989)Google Scholar
  258. 15.258
    D.C. Bentivegna: Learning from Observation Using Primitives (Georgia Institute of Technology, Atlanta 2004)Google Scholar
  259. 15.259
    A. Rottmann, C. Plagemann, P. Hilgers, W. Burgard: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2007)Google Scholar
  260. 15.260
    K. Gräve, J. Stückler, S. Behnke: Learning motion skills from expert demonstrations and own experience using Gaussian process regression, Jt. Int. Symp. Robotics (ISR) Ger. Conf. Robotics (ROBOTIK) (2010)Google Scholar
  261. 15.261
    O. Kroemer, R. Detry, J. Piater, J. Peters: Active learning using mean shift optimization for robot grasping, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2009)Google Scholar
  262. 15.262
    O. Kroemer, R. Detry, J. Piater, J. Peters: Combining active learning and reactive control for robot grasping, Robotics Auton. Syst. 58(9), 1105–1116 (2010)CrossRefGoogle Scholar
  263. 15.263
    T. Tamei, T. Shibata: Policy gradient learning of cooperative interaction with a robot using user's biological signals, Int. Conf. Neural Inf. Process. (2009)Google Scholar
  264. 15.264
    A.J. Ijspeert, J. Nakanishi, S. Schaal: Learning attractor landscapes for learning motor primitives, Adv. Neural Inform. Process. Syst., Vol. 15 (2003) pp. 1547–1554Google Scholar
  265. 15.265
    S. Schaal, P. Mohajerian, A.J. Ijspeert: Dynamics systems vs. optimal control – A unifying view, Prog. Brain Res. 165(1), 425–445 (2007)CrossRefGoogle Scholar
  266. 15.266
    H.-I. Lin, C.-C. Lai: Learning collision-free reaching skill from primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)Google Scholar
  267. 15.267
    J. Kober, B. Mohler, J. Peters: Learning perceptual coupling for motor primitives, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2008)Google Scholar
  268. 15.268
    S. Bitzer, M. Howard, S. Vijayakumar: Using dimensionality reduction to exploit constraints in reinforcement learning, Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (2010)Google Scholar
  269. 15.269
    J. Buchli, F. Stulp, E. Theodorou, S. Schaal: Learning variable impedance control, Int. J. Robotics Res. 30(7), 820–833 (2011)CrossRefGoogle Scholar
  270. 15.270
    P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, S. Schaal: Skill learning and task outcome prediction for manipulation, IEEE Int. Conf. Robotics Autom. (2011)Google Scholar
  271. 15.271
    M. Kalakrishnan, L. Righetti, P. Pastor, S. Schaal: Learning force control policies for compliant manipulation, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2011)Google Scholar
  272. 15.272
    D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learning from observation and practice using behavioral primitives: Marble maze, 11th Int. Symp. Robotics Res. (2004)Google Scholar
  273. 15.273
    F. Kirchner: Q-learning of complex behaviours on a six-legged walking machine, EUROMICRO Workshop Adv. Mobile Robots (1997)Google Scholar
  274. 15.274
    J. Morimoto, K. Doya: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robotics Auton. Syst. 36(1), 37–51 (2001)zbMATHCrossRefGoogle Scholar
  275. 15.275
    J.-Y. Donnart, J.-A. Meyer: Learning reactive and planning rules in a motivationally autonomous animat, Syst. Man Cybern. B 26(3), 381–395 (1996)CrossRefGoogle Scholar
  276. 15.276
    C. Daniel, G. Neumann, J. Peters: Learning concurrent motor skills in versatile solution spaces, IEEE/RSJ Int. Conf. Intell. Robots Syst. (2012)Google Scholar
  277. 15.277
    E.C. Whitman, C.G. Atkeson: Control of instantaneously coupled systems applied to humanoid walking, IEEE-RAS Int. Conf. Humanoid Robots (2010)Google Scholar
  278. 15.278
    X. Huang, J. Weng: Novelty and reinforcement learning in the value system of developmental robots, 2nd Int. Workshop Epigenetic Robotics Model. Cognit. Dev. Robotic Syst. (2002)Google Scholar
  279. 15.279
    M. Pendrith: Reinforcement learning in situated agents: Some theoretical problems and practical solutions, Eur. Workshop Learn. Robots (1999)Google Scholar
  280. 15.280
    B. Wang, J.W. Li, H. Liu: A heuristic reinforcement learning for robot approaching objects, IEEE Conf. Robotics Autom. Mechatron. (2006)Google Scholar
  281. 15.281
    L.P. Kaelbling: Learning in Embedded Systems (Stanford University, Stanford 1990)Google Scholar
  282. 15.282
    R.S. Sutton: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Int. Conf. Mach. Learn. (1990)Google Scholar
  283. 15.283
    A.W. Moore, C.G. Atkeson: Prioritized sweeping: Reinforcement learning with less data and less time, Mach. Learn. 13(1), 103–130 (1993)Google Scholar
  284. 15.284
    J. Peng, R.J. Williams: Incremental multi-step Q-learning, Mach. Learn. 22(1), 283–290 (1996)Google Scholar
  285. 15.285
    N. Jakobi, P. Husbands, I. Harvey: Noise and the reality gap: The use of simulation in evolutionary robotics, 3rd Eur. Conf. Artif. Life (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Autonomous Systems LabTechnical University DarmstadtDarmstadtGermany
  2. 2.Department of Electrical Systems EngineeringUniversity of PennsylvaniaPhiladelphiaUSA
  3. 3.Delft Center for Systems and ControlDelft University of TechnologyDelftNetherlands
  4. 4.Corporate ResearchRobert Bosch GmbHStuttgartGermany
  5. 5.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  6. 6.Depts. of Computer Science, Neuroscience, and Biomedical EngineeringUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations