Skip to main content
Log in

Efficient policy search in low-dimensional embedding spaces by generalizing motion primitives with a parameterized skill memory

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Motion primitives are an established paradigm to generate complex motions from simpler building blocks. A much less addressed issue is at which level to encode and how to organize a library of motion primitives. Typically, the intrinsic variability of a skill is significantly lower-dimensional than the parameter space of motion primitive models. This paper therefore proposes a parameterized skill memory in a first step, which organizes a set of motion primitives in a low-dimensional, topology-preserving embedding space. The skill memory acts as a pivotal mechanism that links low-dimensional skill parametrization to motion primitive parameters and complete motion trajectories. The skill memory is implemented by means of a dynamical system which features continuous generalization of motion shapes and the multi-directional retrieval of motion primitive parameters from low-dimensional skill parametrizations. The skill parametrization can be predefined or automatically discovered, e.g. by unsupervised dimension reduction techniques. The paper shows that parameterized skill memories achieve excellent generalization of motion shapes from few training examples in several scenarios, including the bi-manual manipulation of a rod with the humanoid robot iCub. In a second step, the low-dimensional and topological skill parametrization is leveraged for efficient, gradient-based policy search. Policy search by generalizing motion shapes from low-dimensional parametrizations is compared to conventional policy search in the parameter space of a motion primitive model. It turns out that the reduced search space accessible through the skill memory significantly accelerates the policy improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Barhen, J., Gulati, S., & Zak, M. (1989). Neural learning of constrained nonlinear transformations. Computer, 22, 67–76.

    Article  Google Scholar 

  • Bishop, C. M., Svensén, M., & Williams, C. K. I. (1998). GTM: The generative topographic mapping. Neural Computation, 10(1), 215–234.

    Article  Google Scholar 

  • Bitzer, S., Howard, M., & Vijayakumar, S. (2010). Using dimensionality reduction to exploit constraints in reinforcement learning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3219–3225).

  • Calinon, S., Guenter, F., & Billard, A. (2007). On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 37(2), 286–298.

    Article  Google Scholar 

  • Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. Boca Raton: Chapman & Hall/CRC.

    MATH  Google Scholar 

  • da Silva, B. C., Konidaris, G., & Barto, A. G. (2012). Learning parameterized skills. In International conference on machine learning.

  • Emmerich, C., Reinhart, R. F., & Steil, J. J. (2013). Multi-directional continuous association with input-driven neural dynamics. Neurocomputing, 112, 47–57.

    Article  Google Scholar 

  • Flash, T., & Hogan, N. (1985). The coordination of arm movements: An experimentally confirmed mathematical model. The Journal of Neuroscience, 5(7), 1688–1703.

    Google Scholar 

  • Flash, T., & Hochner, B. (2005). Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6), 660–666.

    Article  Google Scholar 

  • Forte, D., Gams, A., Morimoto, J., & Ude, A. (2012). On-line motion synthesis and adaptation using a trajectory database. Robotics and Autonomous Systems, 60(10), 1327–1339.

    Article  Google Scholar 

  • Hart, C. B., & Giszter, S. F. (2010). A neural basis for motor primitives in the spinal cord. The Journal of Neuroscience, 30(4), 1322–1336.

    Article  Google Scholar 

  • Hinton, G., & Roweis, S. (2002). Stochastic neighbor embedding. In Advances in neural information processing systems (pp. 833–840). Cambridge: MIT Press.

  • Hoffmann, H., Pastor, P., Park, D.-H., & Schaal, S. (2009). Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance. In IEEE international conference on robotics and automation (pp. 2587–2592).

  • Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. Advances in Neural Information Processing Systems, 15, 1523–1530.

    Google Scholar 

  • Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P., & Schaal, S. (2013). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2), 328–373.

    Article  MATH  MathSciNet  Google Scholar 

  • Inamura, T., Toshima, I., & Nakamura, Y. (2003). Acquiring motion elements for bidirectional computation of motion recognition and generation. In: Experimental robotics VIII, volume 5 of Springer tracts in advanced robotics (pp. 372–381).

  • Khansari-Zadeh, S. M., & Billard, A. (2011). Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 27(5), 943–957. The data set of handwriting motions can be downloaded from http://lasa.epfl.ch/khansari/SEDS_handwriting_motions.zip. Accessed 16 Oct 2014.

  • Kober, J., Wilhelm, A., Oztop, E., & Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 33, 361–379.

    Article  Google Scholar 

  • Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.

    Article  Google Scholar 

  • Kupcsik, A., Deisenroth, M. P., Peters, J., & Neumann, G. (2013). Data-efficient generalization of robot skills with contextual policy search. In Proceedings of the AAAI conference on artificial intelligence (pp. 1401–1407).

  • Legenstein, R., Wilbert, N., & Wiskott, L. (2010). Reinforcement learning on slow features of high-dimensional input streams. PLOS Computational Biology, 6(8), e1000894.

    Article  MathSciNet  Google Scholar 

  • Lemme, A., Neumann, K., Reinhart, R. F., & Steil, J. J. (2013). Neurally imprinted stable vector fields. In European symposium on artificial neural networks, best student paper (pp. 327–332).

  • Meier, F., Theodorou, E., Stulp, F., & Schaal, S. (2011). Movement segmentation using a primitive library. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3407–3412).

  • Mühlig, M., Gienger, M., Hellbach, S., Steil, J. J., & Goerick. C. (2009). Task-level imitation learning using variance-based movement optimization. In IEEE international conference on robotics and automation (pp. 1177–1184).

  • Nemec, B., & Ude, A. (2012). Action sequencing using dynamic movement primitives. Robotica, 30, 837–846.

    Article  Google Scholar 

  • Neumann, K., Lemme, A., & Steil, J. J. (2013). Neural learning of stable dynamical systems based on data-driven Lyapunov candidates. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1216–1222).

  • Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE international conference on robotics and automation (pp. 763–768).

  • Reinhart, R. F., & Rolf, M. (2013). Learning versatile sensorimotor coordination with goal babbling and neural associative dynamics. In IEEE international conference on development and learning.

  • Reinhart, R. F., & Steil, J. J. (2012). Learning whole upper body control with dynamic redundancy resolution in coupled associative radial basis function networks. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1487–1492).

  • Reinhart, R. F., Lemme, A., & Steil, J. J. (2012). Representation and generalization of bi-manual skills from kinesthetic teaching. In IEEE-RAS international conference on humanoid robots (pp. 560–567).

  • Saul, L. K., & Roweis, S. T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research, 4, 119–155.

    MathSciNet  Google Scholar 

  • Schaal, S., Ijspeert, A. J., & Billard, A. (2003a). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431), 537–547.

  • Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. J. (2003b). Control, planning, learning, and imitation with dynamic movement primitives. In IEEE international conference on intelligent robots and systems, workshop on bilateral paradigms on humans and humanoids.

  • Soltoggio, A., & Lemme, A. (2013). Movement primitives as a robotic tool to interpret trajectories through learning-by-doing. International Journal of Automation and Computing, 10(5), 375–386.

    Article  Google Scholar 

  • Steffen, J., Haschke, R., & Ritter, H. (2008) Towards dextrous manipulation using manipulation manifolds. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2738–2743).

  • Stulp, F., & Sigaud, O. (2013). Policy improvement: Between black-box optimization and episodic reinforcement learning. In Journées Francophones Planification, Décision, et Apprentissage pour la conduite de systèmes. http://hal.archives-ouvertes.fr/hal-00738463/. Accessed 16 Oct 2014.

  • Tavan, P., Grubmüller, H., & Kühnel, H. (1990). Self-organization of associative memory and pattern classification: Recurrent signal processing on topological feature maps. Biological Cybernetics, 64, 95–105.

    Article  MATH  Google Scholar 

  • The MathWorks Inc., Matlab Neural Network Toolbox. http://www.mathworks.de/products/neural-network/.

  • Theodorou, E., Buchli, J., & Schaal, S. (2010). A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11, 3137–3181.

    MATH  MathSciNet  Google Scholar 

  • Ude, A., Riley, M., Nemec, B., Kos, A., Asfour, T., & Cheng, G. (2007). Synthesizing goal-directed actions from a library of example movements. In IEEE-RAS international conference on humanoid robots (pp. 115–121).

  • Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.

    Article  Google Scholar 

  • Waegeman, T., Wyffels, F., & Schrauwen, B. (2012). A discrete/rhythmic pattern generating RNN. In European symposium on artificial neural networks (pp 567–572).

  • Walter, J., & Ritter, H. (1996). Rapid learning with parametrized self-organizing maps. Neurocomputing, 12(2–3), 131–153.

    Article  MATH  Google Scholar 

  • Wang, X., Tino, P., Fardal, M. A., Raychaudhury, S., & Babul, A. (2009). Fast Parzen window density estimator. In International joint conference on neural networks (pp. 3267–3274).

  • Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment. PLoS Computational Biology, 4(11), e1000220.

    Article  Google Scholar 

Download references

Acknowledgments

The research leading to these results has received funding from the European Community’s 7th Framework Program FP7/2007–2013, Challenge 2 - Cognitive Systems, Interaction, Robotics - under Grant Agreement 248311 - AMARSi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to René Felix Reinhart.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reinhart, R.F., Steil, J.J. Efficient policy search in low-dimensional embedding spaces by generalizing motion primitives with a parameterized skill memory. Auton Robot 38, 331–348 (2015). https://doi.org/10.1007/s10514-014-9417-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-014-9417-9

Keywords

Navigation