Autonomous Robots

, Volume 33, Issue 4, pp 361–379 | Cite as

Reinforcement learning to adjust parametrized motor primitives to new situations

  • Jens Kober
  • Andreas Wilhelm
  • Erhan Oztop
  • Jan Peters
Article

Abstract

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

Keywords

Skill learning Motor primitives Reinforcement learning Meta-parameters Policy learning 

References

  1. Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379. MathSciNetCrossRefGoogle Scholar
  2. Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396. CrossRefGoogle Scholar
  3. Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611. CrossRefGoogle Scholar
  4. Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. MATHGoogle Scholar
  5. Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75. CrossRefGoogle Scholar
  6. Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114. CrossRefGoogle Scholar
  7. Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278. MATHCrossRefGoogle Scholar
  8. Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506. CrossRefGoogle Scholar
  9. Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208). CrossRefGoogle Scholar
  10. Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028). Google Scholar
  11. Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning. Google Scholar
  12. Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530). Google Scholar
  13. Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710). Google Scholar
  14. Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57). Google Scholar
  15. Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343). Google Scholar
  16. Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203. MATHCrossRefGoogle Scholar
  17. Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858). Google Scholar
  18. Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40). Google Scholar
  19. Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717). Google Scholar
  20. Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726). Google Scholar
  21. Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361). Google Scholar
  22. Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910). Google Scholar
  23. Masters Games Ltd (2010). The rules of darts. http://www.mastersgames.com/rules/darts-rules.htm.
  24. McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368). Google Scholar
  25. McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing. Google Scholar
  26. Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416). CrossRefGoogle Scholar
  27. Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376. CrossRefGoogle Scholar
  28. Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91. CrossRefGoogle Scholar
  29. Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98). Google Scholar
  30. Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298). Google Scholar
  31. Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212. CrossRefGoogle Scholar
  32. Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697. CrossRefGoogle Scholar
  33. Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916). CrossRefGoogle Scholar
  34. Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press. MATHGoogle Scholar
  35. Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM. CrossRefGoogle Scholar
  36. Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445. CrossRefGoogle Scholar
  37. Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics. Google Scholar
  38. Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press. Google Scholar
  39. Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063). Google Scholar
  40. Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815. CrossRefGoogle Scholar
  41. Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500). Google Scholar
  42. Welling, M. (2010). The Kalman filter. Lecture notes. Google Scholar
  43. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256. MATHGoogle Scholar
  44. Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Jens Kober
    • 1
    • 2
  • Andreas Wilhelm
    • 3
  • Erhan Oztop
    • 4
    • 5
    • 6
  • Jan Peters
    • 1
    • 2
  1. 1.MPI for Intelligent SystemsTübingenGermany
  2. 2.TU DarmstadtDarmstadtGermany
  3. 3.University of Applied Sciences Ravensburg-WeingartenWeingartenGermany
  4. 4.NICTKyotoJapan
  5. 5.ATRKyotoJapan
  6. 6.Özyeğin UniversityIstanbulTurkey

Personalised recommendations