Autonomous Robots

, Volume 38, Issue 4, pp 331–348 | Cite as

Efficient policy search in low-dimensional embedding spaces by generalizing motion primitives with a parameterized skill memory

  • René Felix Reinhart
  • Jochen Jakob Steil


Motion primitives are an established paradigm to generate complex motions from simpler building blocks. A much less addressed issue is at which level to encode and how to organize a library of motion primitives. Typically, the intrinsic variability of a skill is significantly lower-dimensional than the parameter space of motion primitive models. This paper therefore proposes a parameterized skill memory in a first step, which organizes a set of motion primitives in a low-dimensional, topology-preserving embedding space. The skill memory acts as a pivotal mechanism that links low-dimensional skill parametrization to motion primitive parameters and complete motion trajectories. The skill memory is implemented by means of a dynamical system which features continuous generalization of motion shapes and the multi-directional retrieval of motion primitive parameters from low-dimensional skill parametrizations. The skill parametrization can be predefined or automatically discovered, e.g. by unsupervised dimension reduction techniques. The paper shows that parameterized skill memories achieve excellent generalization of motion shapes from few training examples in several scenarios, including the bi-manual manipulation of a rod with the humanoid robot iCub. In a second step, the low-dimensional and topological skill parametrization is leveraged for efficient, gradient-based policy search. Policy search by generalizing motion shapes from low-dimensional parametrizations is compared to conventional policy search in the parameter space of a motion primitive model. It turns out that the reduced search space accessible through the skill memory significantly accelerates the policy improvement.


Motion primitives Policy search  Self-organization Continuous association 



The research leading to these results has received funding from the European Community’s 7th Framework Program FP7/2007–2013, Challenge 2 - Cognitive Systems, Interaction, Robotics - under Grant Agreement 248311 - AMARSi.


  1. Barhen, J., Gulati, S., & Zak, M. (1989). Neural learning of constrained nonlinear transformations. Computer, 22, 67–76.CrossRefGoogle Scholar
  2. Bishop, C. M., Svensén, M., & Williams, C. K. I. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215–234.CrossRefGoogle Scholar
  3. Bitzer, S., Howard, M., & Vijayakumar, S. (2010). Using dimensionality reduction to exploit constraints in reinforcement learning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3219–3225).Google Scholar
  4. Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 37(2), 286–298.CrossRefGoogle Scholar
  5. Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. Boca Raton: Chapman & Hall/CRC.zbMATHGoogle Scholar
  6. da Silva, B. C., Konidaris, G., & Barto, A. G. (2012). Learning parameterized skills. In International conference on machine learning.Google Scholar
  7. Emmerich, C., Reinhart, R. F., & Steil, J. J. (2013). Multi-directional continuous association with input-driven neural dynamics. Neurocomputing, 112, 47–57.CrossRefGoogle Scholar
  8. Flash, T., & Hogan, N. (1985). The coordination of arm movements: An experimentally confirmed mathematical model. The Journal of Neuroscience, 5(7), 1688–1703.Google Scholar
  9. Flash, T., & Hochner, B. (2005). Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6), 660–666.CrossRefGoogle Scholar
  10. Forte, D., Gams, A., Morimoto, J., & Ude, A. (2012). On-line motion synthesis and adaptation using a trajectory database. Robotics and Autonomous Systems, 60(10), 1327–1339.CrossRefGoogle Scholar
  11. Hart, C. B., & Giszter, S. F. (2010). A neural basis for motor primitives in the spinal cord. The Journal of Neuroscience, 30(4), 1322–1336.CrossRefGoogle Scholar
  12. Hinton, G., & Roweis, S. (2002). Stochastic neighbor embedding. In Advances in neural information processing systems (pp. 833–840). Cambridge: MIT Press.Google Scholar
  13. Hoffmann, H., Pastor, P., Park, D.-H., & Schaal, S. (2009). Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance. In IEEE international conference on robotics and automation (pp. 2587–2592).Google Scholar
  14. Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. Advances in Neural Information Processing Systems, 15, 1523–1530.Google Scholar
  15. Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P., & Schaal, S. (2013). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2), 328–373.CrossRefzbMATHMathSciNetGoogle Scholar
  16. Inamura, T., Toshima, I., & Nakamura, Y. (2003). Acquiring motion elements for bidirectional computation of motion recognition and generation. In: Experimental robotics VIII, volume 5 of Springer tracts in advanced robotics (pp. 372–381).Google Scholar
  17. Khansari-Zadeh, S. M., & Billard, A. (2011). Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 27(5), 943–957. The data set of handwriting motions can be downloaded from Accessed 16 Oct 2014.
  18. Kober, J., Wilhelm, A., Oztop, E., & Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 33, 361–379.CrossRefGoogle Scholar
  19. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.CrossRefGoogle Scholar
  20. Kupcsik, A., Deisenroth, M. P., Peters, J., & Neumann, G. (2013). Data-efficient generalization of robot skills with contextual policy search. In Proceedings of the AAAI conference on artificial intelligence (pp. 1401–1407).Google Scholar
  21. Legenstein, R., Wilbert, N., & Wiskott, L. (2010). Reinforcement learning on slow features of high-dimensional input streams. PLOS Computational Biology, 6(8), e1000894.CrossRefMathSciNetGoogle Scholar
  22. Lemme, A., Neumann, K., Reinhart, R. F., & Steil, J. J. (2013). Neurally imprinted stable vector fields. In European symposium on artificial neural networks, best student paper (pp. 327–332).Google Scholar
  23. Meier, F., Theodorou, E., Stulp, F., & Schaal, S. (2011). Movement segmentation using a primitive library. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3407–3412).Google Scholar
  24. Mühlig, M., Gienger, M., Hellbach, S., Steil, J. J., & Goerick. C. (2009). Task-level imitation learning using variance-based movement optimization. In IEEE international conference on robotics and automation (pp. 1177–1184).Google Scholar
  25. Nemec, B., & Ude, A. (2012). Action sequencing using dynamic movement primitives. Robotica, 30, 837–846.CrossRefGoogle Scholar
  26. Neumann, K., Lemme, A., & Steil, J. J. (2013). Neural learning of stable dynamical systems based on data-driven Lyapunov candidates. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1216–1222).Google Scholar
  27. Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE international conference on robotics and automation (pp. 763–768).Google Scholar
  28. Reinhart, R. F., & Rolf, M. (2013). Learning versatile sensorimotor coordination with goal babbling and neural associative dynamics. In IEEE international conference on development and learning.Google Scholar
  29. Reinhart, R. F., & Steil, J. J. (2012). Learning whole upper body control with dynamic redundancy resolution in coupled associative radial basis function networks. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1487–1492).Google Scholar
  30. Reinhart, R. F., Lemme, A., & Steil, J. J. (2012). Representation and generalization of bi-manual skills from kinesthetic teaching. In IEEE-RAS international conference on humanoid robots (pp. 560–567).Google Scholar
  31. Saul, L. K., & Roweis, S. T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research, 4, 119–155.MathSciNetGoogle Scholar
  32. Schaal, S., Ijspeert, A. J., & Billard, A. (2003a). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431), 537–547.Google Scholar
  33. Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. J. (2003b). Control, planning, learning, and imitation with dynamic movement primitives. In IEEE international conference on intelligent robots and systems, workshop on bilateral paradigms on humans and humanoids.Google Scholar
  34. Soltoggio, A., & Lemme, A. (2013). Movement primitives as a robotic tool to interpret trajectories through learning-by-doing. International Journal of Automation and Computing, 10(5), 375–386.CrossRefGoogle Scholar
  35. Steffen, J., Haschke, R., & Ritter, H. (2008) Towards dextrous manipulation using manipulation manifolds. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2738–2743).Google Scholar
  36. Stulp, F., & Sigaud, O. (2013). Policy improvement: Between black-box optimization and episodic reinforcement learning. In Journées Francophones Planification, Décision, et Apprentissage pour la conduite de systèmes. Accessed 16 Oct 2014.
  37. Tavan, P., Grubmüller, H., & Kühnel, H. (1990). Self-organization of associative memory and pattern classification: Recurrent signal processing on topological feature maps. Biological Cybernetics, 64, 95–105.CrossRefzbMATHGoogle Scholar
  38. The MathWorks Inc., Matlab Neural Network Toolbox.
  39. Theodorou, E., Buchli, J., & Schaal, S. (2010). A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11, 3137–3181.zbMATHMathSciNetGoogle Scholar
  40. Ude, A., Riley, M., Nemec, B., Kos, A., Asfour, T., & Cheng, G. (2007). Synthesizing goal-directed actions from a library of example movements. In IEEE-RAS international conference on humanoid robots (pp. 115–121).Google Scholar
  41. Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.CrossRefGoogle Scholar
  42. Waegeman, T., Wyffels, F., & Schrauwen, B. (2012). A discrete/rhythmic pattern generating RNN. In European symposium on artificial neural networks (pp 567–572).Google Scholar
  43. Walter, J., & Ritter, H. (1996). Rapid learning with parametrized self-organizing maps. Neurocomputing, 12(2–3), 131–153.CrossRefzbMATHGoogle Scholar
  44. Wang, X., Tino, P., Fardal, M. A., Raychaudhury, S., & Babul, A. (2009). Fast Parzen window density estimator. In International joint conference on neural networks (pp. 3267–3274).Google Scholar
  45. Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment. PLoS Computational Biology, 4(11), e1000220.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Research Institute for Cognition and Robotics (CoR-Lab)Bielefeld UniversityBielefeldGermany

Personalised recommendations