Skip to main content

Reinforcement Learning in Robotics: A Survey

  • Chapter

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 12))

Abstract

As most action generation problems of autonomous robots can be phrased in terms of sequential decision problems, robotics offers a tremendously important and interesting application platform for reinforcement learning. Similarly, the real-world challenges of this domain pose a major real-world check for reinforcement learning. Hence, the interplay between both disciplines can be seen as promising as the one between physics and mathematics. Nevertheless, only a fraction of the scientists working on reinforcement learning are sufficiently tied to robotics to oversee most problems encountered in this context. Thus, we will bring the most important challenges faced by robot reinforcement learning to their attention. To achieve this goal, we will attempt to survey most work that has successfully applied reinforcement learning to behavior generation for real robots. We discuss how the presented successful approaches have been made tractable despite the complexity of the domain and will study how representations or the inclusion of prior knowledge can make a significant difference. As a result, a particular focus of our chapter lies on the choice between model-based and model-free as well as between value function-based and policy search methods. As a result, we obtain a fairly complete survey of robot reinforcement learning which should allow a general reinforcement learning researcher to understand this domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: International Conference on Machine Learning, ICML (2006)

    Google Scholar 

  • Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, NIPS (2007)

    Google Scholar 

  • Abbeel, P., Dolgov, D., Ng, A.Y., Thrun, S.: Apprenticeship learning for motion planning with application to parking lot navigation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2008)

    Google Scholar 

  • Argall, B.D., Browning, B., Veloso, M.: Learning robot motion control with demonstration and advice-operators. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2008)

    Google Scholar 

  • Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 469–483 (2009)

    Article  Google Scholar 

  • Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning 23(2-3), 279–303 (1996)

    Article  Google Scholar 

  • Atkeson, C., Moore, A., Stefan, S.: Locally weighted learning for control. AI Review 11, 75–113 (1997)

    Google Scholar 

  • Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Advances in Neural Information Processing Systems, NIPS (1994)

    Google Scholar 

  • Atkeson, C.G.: Nonparametric model-based reinforcement learning. In: Advances in Neural Information Processing Systems, NIPS (1998)

    Google Scholar 

  • Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: International Conference on Machine Learning, ICML (1997)

    Google Scholar 

  • Bagnell, J.A., Schneider, J.C.: Autonomous helicopter control using reinforcement learning policy search methods. In: IEEE International Conference on Robotics and Automation, ICRA (2001)

    Google Scholar 

  • Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: A robot that reinforcement-learns to identify and memorize important previous observations. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2003)

    Google Scholar 

  • Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: Quasi-online reinforcement learning for robots. In: IEEE International Conference on Robotics and Automation, ICRA (2006)

    Google Scholar 

  • Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)

    Article  MathSciNet  Google Scholar 

  • Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    Google Scholar 

  • Bellman, R.E.: Introduction to the Mathematical Theory of Control Processes, vol. 40-I. Academic Press, New York (1967)

    Google Scholar 

  • Bellman, R.E.: Introduction to the Mathematical Theory of Control Processes, vol. 40-II. Academic Press, New York (1971)

    Google Scholar 

  • Benbrahim, H., Franklin, J.A.: Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems 22(3-4), 283–302 (1997)

    Article  Google Scholar 

  • Benbrahim, H., Doleac, J., Franklin, J., Selfridge, O.: Real-time learning: a ball on a beam. In: International Joint Conference on Neural Networks, IJCNN (1992)

    Google Scholar 

  • Bentivegna, D.C.: Learning from observation using primitives. PhD thesis, Georgia Institute of Technology (2004)

    Google Scholar 

  • Betts, J.T.: Practical methods for optimal control using nonlinear programming. In: Advances in Design and Control, vol. 3. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2001)

    Google Scholar 

  • Birdwell, N., Livingston, S.: Reinforcement learning in sensor-guided aibo robots. Tech. rep., University of Tennesse, Knoxville, advised by Dr. Itamar Elhanany (2007)

    Google Scholar 

  • Bitzer, S., Howard, M., Vijayakumar, S.: Using dimensionality reduction to exploit constraints in reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2010)

    Google Scholar 

  • Buchli, J., Stulp, F., Theodorou, E., Schaal, S.: Learning variable impedance control. International Journal of Robotics Research Online First (2011)

    Google Scholar 

  • Buşoniu, L., Babuška, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)

    Google Scholar 

  • Coates, A., Abbeel, P., Ng, A.Y.: Apprenticeship learning for helicopter control. Commun. ACM 52(7), 97–105 (2009)

    Article  Google Scholar 

  • Cocora, A., Kersting, K., Plagemann, C., Burgard, W., Raedt, L.D.: Learning relational navigation policies. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2006)

    Google Scholar 

  • Conn, K., Peters II, R.A.: Reinforcement learning with a supervisor for a mobile robot in a real-world environment. In: IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA (2007)

    Google Scholar 

  • Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)

    Article  Google Scholar 

  • Deisenroth, M.P., Rasmussen, C.E.: A practical and conceptual framework for learning in control. Tech. Rep. UW-CSE-10-06-01, Department of Computer Science & Engineering, University of Washington, USA (2010)

    Google Scholar 

  • Donnart, J.Y., Meyer, J.A.: Learning reactive and planning rules in a motivationally autonomous animat. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 26(3), 381–395 (1996)

    Article  Google Scholar 

  • Dorigo, M., Colombetti, M.: Robot shaping: Developing situated agents through learning. Tech. rep., International Computer Science Institute, Berkeley, CA (1993)

    Google Scholar 

  • Duan, Y., Liu, Q., Xu, X.: Application of reinforcement learning in robot soccer. Engineering Applications of Artificial Intelligence 20(7), 936–950 (2007)

    Article  Google Scholar 

  • Duan, Y., Cui, B., Yang, H.: Robot Navigation Based on Fuzzy RL Algorithm. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.) ISNN 2008, Part I. LNCS, vol. 5263, pp. 391–399. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., Cheng, G.: Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot. I. J. Robotic Res. 27(2), 213–228 (2008)

    Article  Google Scholar 

  • Erden, M.S., Leblebicioğlu, K.: Free gait generation with reinforcement learning for a six-legged robot. Robot. Auton. Syst. 56(3), 199–212 (2008)

    Article  Google Scholar 

  • Fagg, A.H., Lotspeich, D.L., Hoff, J., Bekey, G.A.: Rapid reinforcement learning for reactive control policy design for autonomous robots. In: Artificial Life in Robotics (1998)

    Google Scholar 

  • Gaskett, C., Fletcher, L., Zelinsky, A.: Reinforcement learning for a vision based mobile robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2000)

    Google Scholar 

  • Geng, T., Porr, B., Wörgötter, F.: Fast biped walking with a reflexive controller and real-time policy searching. In: Advances in Neural Information Processing Systems, NIPS (2006)

    Google Scholar 

  • Glynn, P.: Likelihood ratio gradient estimation: an overview. In: Winter Simulation Conference, WSC (1987)

    Google Scholar 

  • Goldberg, D.E.: Genetic algorithms. Addision Wesley (1989)

    Google Scholar 

  • Gräve, K., Stückler, J., Behnke, S.: Learning motion skills from expert demonstrations and own experience using gaussian process regression. In: Joint International Symposium on Robotics (ISR) and German Conference on Robotics, ROBOTIK (2010)

    Google Scholar 

  • Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Advanced Robotics 21(13), 1521–1544 (2007)

    Google Scholar 

  • Gullapalli, V., Franklin, J., Benbrahim, H.: Acquiring robot skills via reinforcement learning. IEEE on Control Systems Magazine 14(1), 13–24 (1994)

    Article  Google Scholar 

  • Hafner, R., Riedmiller, M.: Reinforcement learning on a omnidirectional mobile robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2003)

    Google Scholar 

  • Hafner, R., Riedmiller, M.: Neural reinforcement learning controllers for a real robot application. In: IEEE International Conference on Robotics and Automation, ICRA (2007)

    Google Scholar 

  • Hailu, G., Sommer, G.: Integrating symbolic knowledge in reinforcement learning. In: IEEE International Conference on Systems, Man and Cybernetics (SMC) (1998)

    Google Scholar 

  • Hester, T., Quinlan, M., Stone, P.: Generalized model learning for reinforcement learning on a humanoid robot. In: IEEE International Conference on Robotics and Automation, ICRA (2010)

    Google Scholar 

  • Huang, X., Weng, J.: Novelty and reinforcement learning in the value system of developmental robots. In: Lund University Cognitive Studies (2002)

    Google Scholar 

  • Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. in: Advances in Neural Information Processing Systems, NIPS (2003)

    Google Scholar 

  • Ilg, W., Albiez, J., Jedele, H., Berns, K., Dillmann, R.: Adaptive periodic movement control for the four legged walking machine BISAM. In: IEEE International Conference on Robotics and Automation, ICRA (1999)

    Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  • Kalmár, Z., Szepesvári, C., Lőrincz, A.: Modular Reinforcement Learning: An Application to a Real Robot Task. In: Birk, A., Demiris, J. (eds.) EWLR 1997. LNCS (LNAI), vol. 1545, pp. 29–45. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  • Kappen, H.: Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment 11 (2005)

    Google Scholar 

  • Katz, D., Pyuro, Y., Brock, O.: Learning to manipulate articulated objects in unstructured environments using a grounded relational representation. In: Robotics: Science and Systems, R:SS (2008)

    Google Scholar 

  • Kimura, H., Yamashita, T., Kobayashi, S.: Reinforcement learning of walking behavior for a four-legged robot. In: IEEE Conference on Decision and Control (CDC) (2001)

    Google Scholar 

  • Kirchner, F.: Q-learning of complex behaviours on a six-legged walking machine. In: EUROMICRO Workshop on Advanced Mobile Robots (1997)

    Google Scholar 

  • Kirk, D.E.: Optimal control theory. Prentice-Hall, Englewood Cliffs (1970)

    Google Scholar 

  • Ko, J., Klein, D.J., Fox, D., Hähnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: IEEE International Conference on Robotics and Automation (ICRA) (2007)

    Google Scholar 

  • Kober, J., Peters, J.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, NIPS (2009)

    Google Scholar 

  • Kober, J., Peters, J.: Policy search for motor primitives in robotics. Machine Learning Online First (2010)

    Google Scholar 

  • Kober, J., Mohler, B., Peters, J.: Learning perceptual coupling for motor primitives. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2008)

    Google Scholar 

  • Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Robotics: Science and Systems Conference (R:SS) (2010)

    Google Scholar 

  • Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: IEEE International Conference on Robotics and Automation (ICRA) (2004)

    Google Scholar 

  • Kolter, J.Z., Ng, A.Y.: Policy search via the signed derivative. In: Robotics: Science and Systems (R:SS) (2009)

    Google Scholar 

  • Kolter, J.Z., Abbeel, P., Ng, A.Y.: Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in Neural Information Processing Systems (NIPS) (2007)

    Google Scholar 

  • Kolter, J.Z., Coates, A., Ng, A.Y., Gu, Y., DuHadway, C.: Space-indexed dynamic programming: learning to follow trajectories. In: International Conference on Machine Learning (ICML) (2008)

    Google Scholar 

  • Kolter, J.Z., Plagemann, C., Jackson, D.T., Ng, A.Y., Thrun, S.: A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving. In: IEEE International Conference on Robotics and Automation (ICRA) (2010)

    Google Scholar 

  • Kroemer, O., Detry, R., Piater, J., Peters, J.: Active learning using mean shift optimization for robot grasping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2009)

    Google Scholar 

  • Kroemer, O., Detry, R., Piater, J., Peters, J.: Combining active learning and reactive control for robot grasping. Robotics and Autonomous Systems 58(9), 1105–1116 (2010)

    Article  Google Scholar 

  • Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Berkeley Symposium on Mathematical Statistics and Probability (1950)

    Google Scholar 

  • Latzke, T., Behnke, S., Bennewitz, M.: Imitative Reinforcement Learning for Soccer Playing Robots. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup 2006. LNCS (LNAI), vol. 4434, pp. 47–58. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Lizotte, D., Wang, T., Bowling, M., Schuurmans, D.: Automatic gait optimization with gaussian process regression. In: International Joint Conference on Artifical Intelligence (IJCAI) (2007)

    Google Scholar 

  • Mahadevan, S., Connell, J.: Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55(2-3), 311–365 (1992)

    Article  Google Scholar 

  • Martínez-Marín, T., Duckett, T.: Fast reinforcement learning for vision-guided mobile robots. In: IEEE International Conference on Robotics and Automation (ICRA) (2005)

    Google Scholar 

  • Mataric, M.J.: Reward functions for accelerated learning. In: International Conference on Machine Learning (ICML) (1994)

    Google Scholar 

  • Mataric, M.J.: Reinforcement learning in the multi-robot domain. Autonomous Robots 4, 73–83 (1997)

    Article  Google Scholar 

  • Michels, J., Saxena, A., Ng, A.Y.: High speed obstacle avoidance using monocular vision and reinforcement learning. In: International Conference on Machine Learning (ICML) (2005)

    Google Scholar 

  • Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., Hagita, N.: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2005)

    Google Scholar 

  • Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y., Kawato, M.: A kendama learning robot based on bi-directional theory. Neural Networks 9(8), 1281–1302 (1996)

    Article  Google Scholar 

  • Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36(1), 37–51 (2001)

    Article  Google Scholar 

  • Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.: Operational space control: a theoretical and emprical comparison. International Journal of Robotics Research 27, 737–757 (2008)

    Article  Google Scholar 

  • Nemec, B., Tamošiūnaitė, M., Wörgötter, F., Ude, A.: Task adaptation through exploration and action sequencing. In: IEEE-RAS International Conference on Humanoid Robots, Humanoids (2009)

    Google Scholar 

  • Nemec, B., Zorko, M., Zlajpah, L.: Learning of a ball-in-a-cup playing robot. In: International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD) (2010)

    Google Scholar 

  • Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: International Symposium on Experimental Robotics (ISER) (2004a)

    Google Scholar 

  • Ng, A.Y., Kim, H.J., Jordan, M.I., Sastry, S.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS) (2004b)

    Google Scholar 

  • Oßwald, S., Hornung, A., Bennewitz, M.: Learning reliable and efficient navigation with a humanoid. In: IEEE International Conference on Robotics and Automation (ICRA) (2010)

    Google Scholar 

  • Paletta, L., Fritz, G., Kintzler, F., Irran, J., Dorffner, G.: Perception and Developmental Learning of Affordances in Autonomous Robots. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 235–250. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., Schaal, S.: Skill learning and task outcome prediction for manipulation. In: IEEE International Conference on Robotics and Automation (ICRA) (2011)

    Google Scholar 

  • Pendrith, M.: Reinforcement learning in situated agents: Some theoretical problems and practical solutions. In: European Workshop on Learning Robots (EWRL) (1999)

    Google Scholar 

  • Peters, J., Schaal, S.: Learning to control in operational space. International Journal of Robotics Research 27(2), 197–212 (2008a)

    Article  Google Scholar 

  • Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7-9), 1180–1190 (2008b)

    Article  Google Scholar 

  • Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008c)

    Article  Google Scholar 

  • Peters, J., Vijayakumar, S., Schaal, S.: Linear quadratic regulation as benchmark for policy gradient methods. Tech. rep., University of Southern California (2004)

    Google Scholar 

  • Peters, J., Mülling, K., Altun, Y.: Relative entropy policy search. In: National Conference on Artificial Intelligence (AAAI) (2010a)

    Google Scholar 

  • Peters, J., Mülling, K., Kober, J., Nguyen-Tuong, D., Kroemer, O.: Towards motor skill learning for robotics. In: International Symposium on Robotics Research, ISRR (2010b)

    Google Scholar 

  • Piater, J., Jodogne, S., Detry, R., Kraft, D., Krüger, N., Kroemer, O., Peters, J.: Learning visual representations for perception-action systems. International Journal of Robotics Research Online First (2010)

    Google Scholar 

  • Platt, R., Grupen, R.A., Fagg, A.H.: Improving grasp skills using schema structured learning. In: International Conference on Development and Learning (2006)

    Google Scholar 

  • Åström, K.J., Wittenmark, B.: Adaptive control. Addison-Wesley, Reading (1989)

    Google Scholar 

  • Riedmiller, M., Gabel, T., Hafner, R., Lange, S.: Reinforcement learning for robot soccer. Autonomous Robots 27(1), 55–73 (2009)

    Article  Google Scholar 

  • Rottmann, A., Plagemann, C., Hilgers, P., Burgard, W.: Autonomous blimp control using model-free reinforcement learning in a continuous state and action space. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2007)

    Google Scholar 

  • Rückstieß, T., Felder, M., Schmidhuber, J.: State-Dependent Exploration for Policy Gradient Methods. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 234–249. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Sato, M.-A., Nakamura, Y., Ishii, S.: Reinforcement Learning for Biped Locomotion. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 777–782. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Schaal, S.: Learning from demonstration. In: Advances in Neural Information Processing Systems, NIPS (1997)

    Google Scholar 

  • Schaal, S., Atkeson, C.G.: Robot juggling: An implementation of memory-based learning. Control Systems Magazine 14(1), 57–71 (1994)

    Article  Google Scholar 

  • Schaal, S., Atkeson, C.G., Vijayakumar, S.: Scalable techniques from nonparameteric statistics for real-time robot learning. Applied Intelligence 17(1), 49–60 (2002)

    Article  Google Scholar 

  • Schaal, S., Mohajerian, P., Ijspeert, A.J.: Dynamics systems vs. optimal control - a unifying view. Progress in Brain Research 165(1), 425–445 (2007)

    Article  Google Scholar 

  • Smart, W.D., Kaelbling, L.P.: A framework for reinforcement learning on real robots. In: National Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI/IAAI (1998)

    Google Scholar 

  • Smart, W.D., Kaelbling, L.P.: Effective reinforcement learning for mobile robots. In: IEEE International Conference on Robotics and Automation (ICRA) (2002)

    Google Scholar 

  • Soni, V., Singh, S.: Reinforcement learning of hierarchical skills on the sony aibo robot. In: International Conference on Development and Learning (ICDL) (2006)

    Google Scholar 

  • Strens, M., Moore, A.: Direct policy search using paired statistical tests. In: International Conference on Machine Learning (ICML) (2001)

    Google Scholar 

  • Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Boston (1998)

    Google Scholar 

  • Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: International Machine Learning Conference (1990)

    Google Scholar 

  • Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems (NIPS) (2000)

    Google Scholar 

  • Sutton, R.S., Koop, A., Silver, D.: On the role of tracking in stationary environments. In: International Conference on Machine Learning (ICML) (2007)

    Google Scholar 

  • Svinin, M.M., Yamada, K., Ueda, K.: Emergent synthesis of motion patterns for locomotion robots. Artificial Intelligence in Engineering 15(4), 353–363 (2001)

    Article  Google Scholar 

  • Tamei, T., Shibata, T.: Policy Gradient Learning of Cooperative Interaction with a Robot Using User’s Biological Signals. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 1029–1037. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  • Tedrake, R.: Stochastic policy gradient reinforcement learning on a simple 3d biped. In: International Conference on Intelligent Robots and Systems (IROS) (2004)

    Google Scholar 

  • Tedrake, R., Zhang, T.W., Seung, H.S.: Learning to walk in 20 minutes. In: Yale Workshop on Adaptive and Learning Systems (2005)

    Google Scholar 

  • Tedrake, R., Manchester, I.R., Tobenkin, M.M., Roberts, J.W.: LQR-trees: Feedback motion planning via sums of squares verification. International Journal of Robotics Research 29, 1038–1052 (2010)

    Article  Google Scholar 

  • Theodorou, E.A., Buchli, J., Schaal, S.: Reinforcement learning of motor skills in high dimensions: A path integral approach. In: IEEE International Conference on Robotics and Automation (ICRA) (2010)

    Google Scholar 

  • Thrun, S.: An approach to learning mobile robot navigation. Robotics and Autonomous Systems 15, 301–319 (1995)

    Article  Google Scholar 

  • Tokic, M., Ertel, W., Fessler, J.: The crawler, a class room demonstrator for reinforcement learning. In: International Florida Artificial Intelligence Research Society Conference (FLAIRS) (2009)

    Google Scholar 

  • Toussaint, M., Storkey, A., Harmeling, S.: Expectation-Maximization methods for solving (PO)MDPs and optimal control problems. In: Inference and Learning in Dynamic Models. Cambridge University Press (2010)

    Google Scholar 

  • Touzet, C.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems, Special Issue on Learning Robot: the New Wave 22(3-4), 251–281 (1997)

    Article  Google Scholar 

  • Uchibe, E., Asada, M., Hosoda, K.: Cooperative behavior acquisition in multi mobile robots environment by reinforcement learning based on state vector estimation. In: IEEE International Conference on Robotics and Automation (ICRA) (1998)

    Google Scholar 

  • Vlassis, N., Toussaint, M., Kontes, G., Piperidis, S.: Learning model-free robot control by a Monte Carlo EM algorithm. Autonomous Robots 27(2), 123–130 (2009)

    Article  Google Scholar 

  • Wang, B., Li, J., Liu, H.: A heuristic reinforcement learning for robot approaching objects. In: IEEE Conference on Robotics, Automation and Mechatronics (2006)

    Google Scholar 

  • Willgoss, R.A., Iqbal, J.: Reinforcement learning of behaviors in mobile robots using noisy infrared sensing. In: Australian Conference on Robotics and Automation (1999)

    Google Scholar 

  • Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)

    Google Scholar 

  • Yasuda, T., Ohkura, K.: A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-Robot System. In: Asada, M., Hallam, J.C.T., Meyer, J.-A., Tani, J. (eds.) SAB 2008. LNCS (LNAI), vol. 5040, pp. 250–259. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  • Youssef, S.M.: Neuro-based learning of mobile robots with evolutionary path planning. In: ICGST International Conference on Automation, Robotics and Autonomous Systems (ARAS) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kober, J., Peters, J. (2012). Reinforcement Learning in Robotics: A Survey. In: Wiering, M., van Otterlo, M. (eds) Reinforcement Learning. Adaptation, Learning, and Optimization, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27645-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27645-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27644-6

  • Online ISBN: 978-3-642-27645-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics