Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

  • Jens Kober
  • Betty Mohler
  • Jan Peters

Abstract

Traditional motor primitive approaches deal largely with open-loop policies which can only deal with small perturbations. In this paper, we present a new type of motor primitive policies which serve as closed-loop policies together with an appropriate learning algorithm. Our new motor primitives are an augmented version version of the dynamical system-based motor primitives [Ijspeert et al(2002)Ijspeert, Nakanishi, and Schaal] that incorporates perceptual coupling to external variables. We show that these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial conditions where a skilled human player would be challenged. We initialize the open-loop policies by imitation learning and the perceptual coupling with a handcrafted solution. We first improve the open-loop policies and subsequently the perceptual coupling using a novel reinforcement learning method which is particularly well-suited for dynamical system-based motor primitives.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Andrieu et al(2003)Andrieu, de Freitas, Doucet, and Jordan]
    Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Machine Learning 50(1), 5–43 (2003)MATHCrossRefGoogle Scholar
  2. [Atkeson(1994)]
    Atkeson, C.G.: Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Hanson, J.E., Moody, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems 6 (NIPS), pp. 503–521. Morgan Kaufmann, Denver (1994)Google Scholar
  3. [Guenter et al(2007)Guenter, Hersch, Calinon, and Billard]
    Guenter, F., Hersch, M., Calinon, S., Billard, A.: Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, Special Issue on Imitative Robots 21(13), 1521–1544 (2007)Google Scholar
  4. [Howard et al(2009a)Howard, Klanke, Gienger, Goerick, and Vijayakumar]
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Methods for learning control policies from variable-constraint demonstrations. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 253–291. Springer, Heidelberg (2010)Google Scholar
  5. [Howard et al(2009b)Howard, Klanke, Gienger, Goerick, and Vijayakumar]
    Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: A novel method for learning policies from variable constraint data. Autonomous Robots (2009b)Google Scholar
  6. [Ijspeert et al(2002)Ijspeert, Nakanishi, and Schaal]
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proc. IEEE Int. Conf. on Robotics and Automation (ICRA), Washington, DC, pp. 1398–1403 (2002)Google Scholar
  7. [Ijspeert et al(2003)Ijspeert, Nakanishi, and Schaal]
    Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 16 (NIPS), vol. 15, pp. 1547–1554. MIT Press, Cambridge (2003)Google Scholar
  8. [Kober and Peters(2008)]
    Kober, J., Peters, J.: Policy search for motor primitives in robotics. In: Advances in Neural Information Processing Systems, NIPS (2008)Google Scholar
  9. [Kulic and Nakamura(2009)]
    Kulic, D., Nakamura, Y.: Incremental learning of full body motion primitives. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 383–406. Springer, Heidelberg (2010)Google Scholar
  10. [Miyamoto et al(1996)Miyamoto, Schaal, Gandolfo, Gomi, Koike, Osu, Nakano, Wada, and Kawato]
    Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y., Kawato, M.: A kendama learning robot based on bi-directional theory. Neural Networks 9(8), 1281–1302 (1996)CrossRefGoogle Scholar
  11. [Nakanishi et al(2004a)Nakanishi, Morimoto, Endo, Cheng, Schaal, and Kawato]
    Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: A framework for learning biped locomotion with dynamic movement primitives. In: Proc. IEEE-RAS Int. Conf. on Humanoid Robots (HUMANOIDS), Santa Monica, CA, November 10-12. IEEE, Los Angeles (2004)Google Scholar
  12. [Nakanishi et al(2004b)Nakanishi, Morimoto, Endo, Cheng, Schaal, and Kawato]
    Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems (RAS) 47(2-3), 79–91 (2004)CrossRefGoogle Scholar
  13. [Nakanishi et al(2007)Nakanishi, Mistry, Peters, and Schaal]
    Nakanishi, J., Mistry, M., Peters, J., Schaal, S.: Experimental evaluation of task space position/orientation control towards compliant control for humanoid robots. In: Proc. IEEE/RSJ 2007 Int. Conf. on Intell. Robotics Systems, IROS (2007)Google Scholar
  14. [Peters and Schaal(2006)]
    Peters, J., Schaal, S.: Policy gradient methods for robotics. In: Proc. IEEE/RSJ 2006 Int. Conf. on Intell. Robots and Systems (IROS), Beijing, China, pp. 2219–2225 (2006)Google Scholar
  15. [Peters and Schaal(2007)]
    Peters, J., Schaal, S.: Reinforcement learning for operational space. In: Proc. Int. Conference on Robotics and Automation (ICRA), Rome, Italy (2007)Google Scholar
  16. [Pongas et al(2005)Pongas, Billard, and Schaal]
    Pongas, D., Billard, A., Schaal, S.: Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In: Proc. IEEE 2005 Int. Conf. on Intell. Robots and Systems (IROS), vol. 2005, pp. 2911–2916 (2005)Google Scholar
  17. [Ratliff et al(2009)Ratliff, Silver, and Bagnell]
    Ratliff, N., Silver, D., Bagnell, J.: Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots 27(1), 25–53 (2009)CrossRefGoogle Scholar
  18. [Riedmiller et al(2009)Riedmiller, Gabel, Hafner, and Lange]
    Riedmiller, M., Gabel, T., Hafner, R., Lange, S.: Reinforcement learning for robot soccer. Autonomous Robots 27(1), 55–73 (2009)CrossRefGoogle Scholar
  19. [Rückstieß et al(2008)Rückstieß, Felder, and Schmidhuber]
    Rückstieß, T., Felder, M., Schmidhuber, J.: State-dependent exploration for policy gradient methods. In: Proceedings of the European Conference on Machine Learning (ECML), pp. 234–249 (2008)Google Scholar
  20. [Sato et al(1993)Sato, Sakaguchi, Masutani, and Miyazaki]
    Sato, S., Sakaguchi, T., Masutani, Y., Miyazaki, F.: Mastering of a task with interaction between a robot and its environment: “kendama” task. Transactions of the Japan Society of Mechanical Engineers C 59(558), 487–493 (1993)Google Scholar
  21. [Schaal et al(2003)Schaal, Peters, Nakanishi, and Ijspeert]
    Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.J.: Control, planning, learning, and imitation with dynamic movement primitives. In: Proc. Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE 2003 Int. Conf. on Intell. Robots and Systems (IROS), Las Vegas, NV, October 27-31 (2003)Google Scholar
  22. [Schaal et al(2007)Schaal, Mohajerian, and Ijspeert]
    Schaal, S., Mohajerian, P., Ijspeert, A.J.: Dynamics systems vs. optimal control — a unifying view. Progress in Brain Research 165(1), 425–445 (2007)CrossRefGoogle Scholar
  23. [Shone et al(2000)Shone, Krudysz, and Brown]
    Shone, T., Krudysz, G., Brown, K.: Dynamic manipulation of kendama. Tech. rep., Rensselaer Polytechnic Institute (2000)Google Scholar
  24. [Sutton and Barto(1998)]
    Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  25. [Takenaka(1984)]
    Takenaka, K.: Dynamical control of manipulator with vision: “cup and ball” game demonstrated by robot. Transactions of the Japan Society of Mechanical Engineers C 50(458), 2046–2053 (1984)Google Scholar
  26. [Urbanek et al(2004)Urbanek, Albu-Schäffer, and van der Smagt]
    Urbanek, H., Albu-Schäffer, A., van der Smagt, P.: Learning from demonstration repetitive movements for autonomous service robotics. In: Proc. IEEE/RSL 2004 Int. Conf. on Intell. Robots and Systems (IROS), Sendai, Japan, pp. 3495–3500 (2004)Google Scholar
  27. [Wikipedia(2008)]
  28. [Williams(1992)]
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)MATHGoogle Scholar
  29. [Wulf(2007)]
    Wulf, G.: Attention and motor skill learning. Human Kinetics, Urbana Champaign (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jens Kober
    • 1
  • Betty Mohler
    • 1
  • Jan Peters
    • 1
  1. 1.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations