Efficient policy search in low-dimensional embedding spaces by generalizing motion primitives with a parameterized skill memory
Motion primitives are an established paradigm to generate complex motions from simpler building blocks. A much less addressed issue is at which level to encode and how to organize a library of motion primitives. Typically, the intrinsic variability of a skill is significantly lower-dimensional than the parameter space of motion primitive models. This paper therefore proposes a parameterized skill memory in a first step, which organizes a set of motion primitives in a low-dimensional, topology-preserving embedding space. The skill memory acts as a pivotal mechanism that links low-dimensional skill parametrization to motion primitive parameters and complete motion trajectories. The skill memory is implemented by means of a dynamical system which features continuous generalization of motion shapes and the multi-directional retrieval of motion primitive parameters from low-dimensional skill parametrizations. The skill parametrization can be predefined or automatically discovered, e.g. by unsupervised dimension reduction techniques. The paper shows that parameterized skill memories achieve excellent generalization of motion shapes from few training examples in several scenarios, including the bi-manual manipulation of a rod with the humanoid robot iCub. In a second step, the low-dimensional and topological skill parametrization is leveraged for efficient, gradient-based policy search. Policy search by generalizing motion shapes from low-dimensional parametrizations is compared to conventional policy search in the parameter space of a motion primitive model. It turns out that the reduced search space accessible through the skill memory significantly accelerates the policy improvement.
KeywordsMotion primitives Policy search Self-organization Continuous association
- Bitzer, S., Howard, M., & Vijayakumar, S. (2010). Using dimensionality reduction to exploit constraints in reinforcement learning. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3219–3225).Google Scholar
- da Silva, B. C., Konidaris, G., & Barto, A. G. (2012). Learning parameterized skills. In International conference on machine learning.Google Scholar
- Flash, T., & Hogan, N. (1985). The coordination of arm movements: An experimentally confirmed mathematical model. The Journal of Neuroscience, 5(7), 1688–1703.Google Scholar
- Hinton, G., & Roweis, S. (2002). Stochastic neighbor embedding. In Advances in neural information processing systems (pp. 833–840). Cambridge: MIT Press.Google Scholar
- Hoffmann, H., Pastor, P., Park, D.-H., & Schaal, S. (2009). Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance. In IEEE international conference on robotics and automation (pp. 2587–2592).Google Scholar
- Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. Advances in Neural Information Processing Systems, 15, 1523–1530.Google Scholar
- Inamura, T., Toshima, I., & Nakamura, Y. (2003). Acquiring motion elements for bidirectional computation of motion recognition and generation. In: Experimental robotics VIII, volume 5 of Springer tracts in advanced robotics (pp. 372–381).Google Scholar
- Khansari-Zadeh, S. M., & Billard, A. (2011). Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 27(5), 943–957. The data set of handwriting motions can be downloaded from http://lasa.epfl.ch/khansari/SEDS_handwriting_motions.zip. Accessed 16 Oct 2014.
- Kupcsik, A., Deisenroth, M. P., Peters, J., & Neumann, G. (2013). Data-efficient generalization of robot skills with contextual policy search. In Proceedings of the AAAI conference on artificial intelligence (pp. 1401–1407).Google Scholar
- Lemme, A., Neumann, K., Reinhart, R. F., & Steil, J. J. (2013). Neurally imprinted stable vector fields. In European symposium on artificial neural networks, best student paper (pp. 327–332).Google Scholar
- Meier, F., Theodorou, E., Stulp, F., & Schaal, S. (2011). Movement segmentation using a primitive library. In IEEE/RSJ international conference on intelligent robots and systems (pp. 3407–3412).Google Scholar
- Mühlig, M., Gienger, M., Hellbach, S., Steil, J. J., & Goerick. C. (2009). Task-level imitation learning using variance-based movement optimization. In IEEE international conference on robotics and automation (pp. 1177–1184).Google Scholar
- Neumann, K., Lemme, A., & Steil, J. J. (2013). Neural learning of stable dynamical systems based on data-driven Lyapunov candidates. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1216–1222).Google Scholar
- Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE international conference on robotics and automation (pp. 763–768).Google Scholar
- Reinhart, R. F., & Rolf, M. (2013). Learning versatile sensorimotor coordination with goal babbling and neural associative dynamics. In IEEE international conference on development and learning.Google Scholar
- Reinhart, R. F., & Steil, J. J. (2012). Learning whole upper body control with dynamic redundancy resolution in coupled associative radial basis function networks. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1487–1492).Google Scholar
- Reinhart, R. F., Lemme, A., & Steil, J. J. (2012). Representation and generalization of bi-manual skills from kinesthetic teaching. In IEEE-RAS international conference on humanoid robots (pp. 560–567).Google Scholar
- Schaal, S., Ijspeert, A. J., & Billard, A. (2003a). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 358(1431), 537–547.Google Scholar
- Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. J. (2003b). Control, planning, learning, and imitation with dynamic movement primitives. In IEEE international conference on intelligent robots and systems, workshop on bilateral paradigms on humans and humanoids.Google Scholar
- Steffen, J., Haschke, R., & Ritter, H. (2008) Towards dextrous manipulation using manipulation manifolds. In IEEE/RSJ international conference on intelligent robots and systems (pp. 2738–2743).Google Scholar
- Stulp, F., & Sigaud, O. (2013). Policy improvement: Between black-box optimization and episodic reinforcement learning. In Journées Francophones Planification, Décision, et Apprentissage pour la conduite de systèmes. http://hal.archives-ouvertes.fr/hal-00738463/. Accessed 16 Oct 2014.
- The MathWorks Inc., Matlab Neural Network Toolbox. http://www.mathworks.de/products/neural-network/.
- Ude, A., Riley, M., Nemec, B., Kos, A., Asfour, T., & Cheng, G. (2007). Synthesizing goal-directed actions from a library of example movements. In IEEE-RAS international conference on humanoid robots (pp. 115–121).Google Scholar
- Waegeman, T., Wyffels, F., & Schrauwen, B. (2012). A discrete/rhythmic pattern generating RNN. In European symposium on artificial neural networks (pp 567–572).Google Scholar
- Wang, X., Tino, P., Fardal, M. A., Raychaudhury, S., & Babul, A. (2009). Fast Parzen window density estimator. In International joint conference on neural networks (pp. 3267–3274).Google Scholar