Policy Learning for Motor Skills

  • Jan Peters
  • Stefan Schaal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4985)


Policy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.


Motor Skill Reinforcement Learning Humanoid Robot Neural Information Processing System Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aberdeen, D.: POMDPs and policy gradients. In: Proceedings of the Machine Learning Summer School (MLSS), Canberra, Australia (2006)Google Scholar
  2. 2.
    Aberdeen, D.A.: Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National Unversity (2003)Google Scholar
  3. 3.
    Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)zbMATHCrossRefGoogle Scholar
  4. 4.
    Ijspeert, A., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 1547–1554. MIT Press, Cambridge (2003)Google Scholar
  5. 5.
    Kakade, S.A.: Natural policy gradient. In: Advances in Neural Information Processing Systems, Vancouver, CA, vol. 14 (2002)Google Scholar
  6. 6.
    Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Advances in Neural Information Processing Systems 12 (2000)Google Scholar
  7. 7.
    Peters, J.: The bias of the greedy update. Technical report, University of Southern California (2007)Google Scholar
  8. 8.
    Peters, J., Mistry, M., Udwadia, F., Cory, R., Nakanishi, J., Schaal, S.: A unifying methodology for the control of robotic systems. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Edmonton, Canada (2005)Google Scholar
  9. 9.
    Peters, J., Schaal, S.: Learning operational space control. In: Proceedings of Robotics: Science and Systems (RSS), Philadelphia, PA (2006)Google Scholar
  10. 10.
    Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Proceedings of the IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS), Karlsruhe, Germany (September 2003)Google Scholar
  11. 11.
    Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Richter, S., Aberdeen, D., Yu, J.: Natural actor-critic for road traffic optimisation. In: Schoelkopf, B., Platt, J.C., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge (2007)Google Scholar
  13. 13.
    Schaal, S.: Dynamic movement primitives - a framework for motor control in humans and humanoid robots. In: Proceedings of the International Symposium on Adaptive Motion of Animals and Machines (2003)Google Scholar
  14. 14.
    Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. In: Frith, C.D., Wolpert, D. (eds.) The Neuroscience of Social Interaction, pp. 199–218. Oxford University Press, Oxford (2004)Google Scholar
  15. 15.
    Sciavicco, L., Siciliano, B.: Modeling and control of robot manipulators. MacGraw-Hill, Heidelberg (2007)Google Scholar
  16. 16.
    Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Mueller, K.-R. (eds.) Advances in Neural Information Processing Systems (NIPS), Denver, CO, MIT Press, Cambridge (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jan Peters
    • 1
    • 2
  • Stefan Schaal
    • 2
    • 3
  1. 1.Max-Planck Institute for Biological CyberneticsTübingen 
  2. 2.University of Southern CaliforniaLos AngelesUSA
  3. 3.ATR Computational Neuroscience Laboratory, Soraku-gunKyotoJapan

Personalised recommendations