Advertisement

Learning New Basic Movements for Robotics

  • Jens Kober
  • Jan Peters
Part of the Informatik aktuell book series (INFORMAT)

Abstract

Obtaining novel skills is one of the most important problems in robotics. Machine learning techniques may be a promising approach for automatic and autonomous acquisition of movement policies. However, this requires both an appropriate policy representation and suitable learning algorithms. Employing the most recent form of the dynamical systems motor primitives originally introduced by Ijspeert et al. [1], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning, and present our current best performing learning algorithms. Finally, we show that it is possible to include a start-up phase in rhythmic primitives. We apply our approach to two elementary movements, i.e., Ball-in-a-Cup and Ball-Paddling, which can be learned on a real Barrett WAM robot arm at a pace similar to human learning.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. J. Ijspeert, J. Nakanishi, and S. Schaal, “Learning attractor landscapes for learning motor primitives,” in Advances in Neural Information Processing Systems (NIPS), 2003.Google Scholar
  2. 2.
    T. Flash and B. Hochner, “Motor primitives in vertebrates and invertebrates,” Current Opinions in Neurobiology, vol. 15, pp. 660–666, 2005.CrossRefGoogle Scholar
  3. 3.
    S. Schaal, J. Peters, J. Nakanishi, and A. J. Ijspeert, “Learning movement primitives,” in International Symposium on Robotics Research 2003 (ISRR), ser. Springer Tracts in Advanced Robotics, 2004, pp. 561–572.Google Scholar
  4. 4.
    J. Peters and S. Schaal, “Policy gradient methods for robotics,” in Proceedings of the IEEE/RSJ 2006 International Conference on Intelligent RObots and Systems (IROS), 2006, pp. 2219–2225.Google Scholar
  5. 5.
    S. Schaal, J. Peters, J. Nakanishi, and A. J. Ijspeert, “Control, planning, learning, and imitation with dynamic movement primitives,” in Proceedings of the Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE 2003 International Conference on Intelligent RObots and Systems (IROS), 2003.Google Scholar
  6. 6.
    F. Guenter, M. Hersch, S. Calinon, and A. Billard, “Reinforcement learning for imitating constrained reaching movements,” Advanced Robotics, Special Issue on Imitative Robots, vol. 21, no. 13, pp. 1521–1544, 2007.Google Scholar
  7. 7.
    H. Urbanek, A. Albu-Schäffer, and P.v.d. Smagt, “Learning from demonstration repetitive movements for autonomous service robotics,” in Proceedings of the IEEE/RSL 2004 International Conference on Intelligent RObots and Systems (IROS), 2004, pp. 3495–3500.Google Scholar
  8. 8.
    R. E. Bellman, Dynamic Programming. Princeton University Press, 1957.Google Scholar
  9. 9.
    S. Schaal, P. Mohajerian, and A. J. Ijspeert, “Dynamics systems vs. optimal control — a unifying view,” Progress in Brain Research, vol. 165, no. 1, pp. 425–445, 2007.CrossRefGoogle Scholar
  10. 10.
    R. Sutton and A. Barto, Reinforcement Learning. MIT PRESS, 1998.Google Scholar
  11. 11.
    J. Peters and S. Schaal, “Reinforcement learning for operational space,” in Proceedings of the International Conference on Robotics and Automation (ICRA), 2007.Google Scholar
  12. 12.
    R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, pp. 229–256, 1992.zbMATHGoogle Scholar
  13. 13.
    C. G. Atkeson, “Using local trajectory optimizers to speed up global optimization in dynamic programming,” in Advances in Neural Information Processing Systems 6 (NIPS), 1994.Google Scholar
  14. 14.
    P. Dayan and G. E. Hinton, “Using expectation-maximization for reinforcement learning,” Neural Computation, vol. 9, no. 2, pp. 271–278, 1997.zbMATHCrossRefGoogle Scholar
  15. 15.
    J. Kober and J. Peters, “Policy search for motor primitives in robotics,” in Advances in Neural Information Processing Systems (NIPS), 2008.Google Scholar
  16. 16.
    T. Rückstieß, M. Felder, and J. Schmidhuber, “State-dependent exploration for policy gradient methods,” in Proceedings of the European Conference on Machine Learning (ECML), 2008, pp. 234–249.Google Scholar
  17. 17.
    Wikipedia, “Ball-in-a-cup,” January 2009. [Online]. Available: http://en.wikipedia.org/wiki/Ball_in_a_cupGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jens Kober
    • 1
  • Jan Peters
    • 1
  1. 1.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations