Advertisement

Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

  • Jens Kober
  • Betty Mohler
  • Jan Peters
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 264)

Abstract

Traditional motor primitive approaches deal largely with open-loop policies which can only deal with small perturbations. In this paper, we present a new type of motor primitive policies which serve as closed-loop policies together with an appropriate learning algorithm. Our new motor primitives are an augmented version version of the dynamical system-based motor primitives [Ijspeert et al(2002)Ijspeert, Nakanishi, and Schaal] that incorporates perceptual coupling to external variables. We show that these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial conditions where a skilled human player would be challenged. We initialize the open-loop policies by imitation learning and the perceptual coupling with a handcrafted solution. We first improve the open-loop policies and subsequently the perceptual coupling using a novel reinforcement learning method which is particularly well-suited for dynamical system-based motor primitives.

Keywords

Reinforcement Learning Humanoid Robot Neural Information Processing System External Variable Canonical System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Andrieu et al(2003)Andrieu, de Freitas, Doucet, and Jordan]
    Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Machine Learning 50(1), 5–43 (2003)zbMATHCrossRefGoogle Scholar
  2. [Atkeson(1994)]
    Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Hanson, J.E., Moody, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems 6 (NIPS), pp. 503–521. Morgan Kaufmann, Denver (1994)Google Scholar
  3. [Guenter et al(2007)Guenter, Hersch, Calinon, and Billard]
    Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, Special Issue on Imitative Robots 21(13), 1521–1544 (2007)Google Scholar
  4. [Howard et al(2009a)Howard, Klanke, Gienger, Goerick, and Vijayakumar]
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Methods for learning control policies from variable-constraint demonstrations. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 253–291. Springer, Heidelberg (2010)Google Scholar
  5. [Howard et al(2009b)Howard, Klanke, Gienger, Goerick, and Vijayakumar]
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: A novel method for learning policies from variable constraint data. Autonomous Robots (2009b)Google Scholar
  6. [Ijspeert et al(2002)Ijspeert, Nakanishi, and Schaal]
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), Washington, DC, pp. 1398–1403 (2002)Google Scholar
  7. [Ijspeert et al(2003)Ijspeert, Nakanishi, and Schaal]
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 16 (NIPS), vol. 15, pp. 1547–1554. MIT Press, Cambridge (2003)Google Scholar
  8. [Kober and Peters(2008)]
    Kober, J., Peters, J.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, NIPS (2008)Google Scholar
  9. [Kulic and Nakamura(2009)]
    Kulic, D., Nakamura, Y.: Incremental learning of full body motion primitives. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 383–406. Springer, Heidelberg (2010)Google Scholar
  10. [Miyamoto et al(1996)Miyamoto, Schaal, Gandolfo, Gomi, Koike, Osu, Nakano, Wada, and Kawato]
    Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y., Kawato, M.: A kendama learning robot based on bi-directional theory. Neural Networks 9(8), 1281–1302 (1996)CrossRefGoogle Scholar
  11. [Nakanishi et al(2004a)Nakanishi, Morimoto, Endo, Cheng, Schaal, and Kawato]
    Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: A framework for learning biped locomotion with dynamic movement primitives. In: Proc. IEEE-RAS Int. Conf. on Humanoid Robots (HUMANOIDS), Santa Monica, CA, November 10-12. IEEE, Los Angeles (2004)Google Scholar
  12. [Nakanishi et al(2004b)Nakanishi, Morimoto, Endo, Cheng, Schaal, and Kawato]
    Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems (RAS) 47(2-3), 79–91 (2004)CrossRefGoogle Scholar
  13. [Nakanishi et al(2007)Nakanishi, Mistry, Peters, and Schaal]
    Nakanishi, J., Mistry, M., Peters, J., Schaal, S.: Experimental evaluation of task space position/orientation control towards compliant control for humanoid robots. In: Proc. IEEE/RSJ 2007 Int. Conf. on Intell. Robotics Systems, IROS (2007)Google Scholar
  14. [Peters and Schaal(2006)]
    Peters, J., Schaal, S.: Policy gradient methods for robotics. In: Proc. IEEE/RSJ 2006 Int. Conf. on Intell. Robots and Systems (IROS), Beijing, China, pp. 2219–2225 (2006)Google Scholar
  15. [Peters and Schaal(2007)]
    Peters, J., Schaal, S.: Reinforcement learning for operational space. In: Proc. Int. Conference on Robotics and Automation (ICRA), Rome, Italy (2007)Google Scholar
  16. [Pongas et al(2005)Pongas, Billard, and Schaal]
    Pongas, D., Billard, A., Schaal, S.: Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In: Proc. IEEE 2005 Int. Conf. on Intell. Robots and Systems (IROS), vol. 2005, pp. 2911–2916 (2005)Google Scholar
  17. [Ratliff et al(2009)Ratliff, Silver, and Bagnell]
    Ratliff, N., Silver, D., Bagnell, J.: Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots 27(1), 25–53 (2009)CrossRefGoogle Scholar
  18. [Riedmiller et al(2009)Riedmiller, Gabel, Hafner, and Lange]
    Riedmiller, M., Gabel, T., Hafner, R., Lange, S.: Reinforcement learning for robot soccer. Autonomous Robots 27(1), 55–73 (2009)CrossRefGoogle Scholar
  19. [Rückstieß et al(2008)Rückstieß, Felder, and Schmidhuber]
    Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent exploration for policy gradient methods. In: Proceedings of the European Conference on Machine Learning (ECML), pp. 234–249 (2008)Google Scholar
  20. [Sato et al(1993)Sato, Sakaguchi, Masutani, and Miyazaki]
    Sato, S., Sakaguchi, T., Masutani, Y., Miyazaki, F.: Mastering of a task with interaction between a robot and its environment: “kendama” task. Transactions of the Japan Society of Mechanical Engineers C 59(558), 487–493 (1993)Google Scholar
  21. [Schaal et al(2003)Schaal, Peters, Nakanishi, and Ijspeert]
    Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.J.: Control, planning, learning, and imitation with dynamic movement primitives. In: Proc. Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE 2003 Int. Conf. on Intell. Robots and Systems (IROS), Las Vegas, NV, October 27-31 (2003)Google Scholar
  22. [Schaal et al(2007)Schaal, Mohajerian, and Ijspeert]
    Schaal, S., Mohajerian, P., Ijspeert, A.J.: Dynamics systems vs. optimal control — a unifying view. Progress in Brain Research 165(1), 425–445 (2007)CrossRefGoogle Scholar
  23. [Shone et al(2000)Shone, Krudysz, and Brown]
    Shone, T., Krudysz, G., Brown, K.: Dynamic manipulation of kendama. Tech. rep., Rensselaer Polytechnic Institute (2000)Google Scholar
  24. [Sutton and Barto(1998)]
    Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  25. [Takenaka(1984)]
    Takenaka, K.: Dynamical control of manipulator with vision: “cup and ball” game demonstrated by robot. Transactions of the Japan Society of Mechanical Engineers C 50(458), 2046–2053 (1984)Google Scholar
  26. [Urbanek et al(2004)Urbanek, Albu-Schäffer, and van der Smagt]
    Urbanek, H., Albu-Schäffer, A., van der Smagt, P.: Learning from demonstration repetitive movements for autonomous service robotics. In: Proc. IEEE/RSL 2004 Int. Conf. on Intell. Robots and Systems (IROS), Sendai, Japan, pp. 3495–3500 (2004)Google Scholar
  27. [Wikipedia(2008)]
  28. [Williams(1992)]
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)zbMATHGoogle Scholar
  29. [Wulf(2007)]
    Wulf, G.: Attention and motor skill learning. Human Kinetics, Urbana Champaign (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jens Kober
    • 1
  • Betty Mohler
    • 1
  • Jan Peters
    • 1
  1. 1.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations