Autonomous Robots

, Volume 42, Issue 1, pp 45–64 | Cite as

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

  • Joel Rey
  • Klas KronanderEmail author
  • Farbod Farshidian
  • Jonas Buchli
  • Aude Billard


An important challenge when using reinforcement learning for learning motions in robotics is the choice of parameterization for the policy. We use Gaussian Mixture Regression to extract a parameterization with relevant non-linear features from a set of demonstrations of a motion following the paradigm of learning from demonstration. The resulting parameterization takes the form of a non-linear time-invariant dynamical system (DS). We use this time-invariant DS as a parameterized policy for a variant of the PI2 policy search algorithm. This paper contributes by adapting PI2 for our time-invariant motion representation. We introduce two novel parameter exploration schemes that can be used to (1) sample model parameters to achieve a uniform exploration in state space and (2) explore while ensuring stability of the resulting motion model. Additionally, a state dependent stiffness profile is learned simultaneously to the reference trajectory and both are used together in a variable impedance control architecture. This learning architecture is validated in a hardware experiment consisting of a digging task using a KUKA LWR platform.


Dynamical systems Reinforcement learning Manipulation 



This research was funded by the European Union Seventh Framework Programme FP7/2007-2013 under Grant Agreement No. 288533 ROBOHOW.COG and by the Swiss National Science Foundation through the National Center of Competence in Research Robotics.


  1. Ajoudani, A., Tsagarakis, N., & Bicchi, A. (2012). Tele-impedance: Teleoperation with impedance regulation using a body-machine interface. The International Journal of Robotics Research, 31(13), 1642–1656.CrossRefGoogle Scholar
  2. Billard, A., Calinon, S., Dillmann, R., & Schaal, S. (2008). Handbook of Robotics Chapter 59: Robot Programming by Demonstration. In Handbook of Robotics. Springer.Google Scholar
  3. Buchli, J., Stulp, F., Theodorou, E., & Schaal, S. (2011). Learning variable impedance control. The International Journal of Robotics Research, 30(7), 820–833.CrossRefGoogle Scholar
  4. Burdet, E., Osu, R., Franklin, D. W., Milner, T. E., & Kawato, M. (2001). The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature, 414(6862), 4469.CrossRefGoogle Scholar
  5. Calinon, S., Bruno, S., & Caldwell, D.G. (2014). A task-parameterized probabilistic model with minimal intervention control. In IEEE International Conference on Robotics and Automation (pp. 3339–3344).Google Scholar
  6. Calinon, S., Sardellitti, I., Caldwell, D. (2010). Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 249–254).Google Scholar
  7. Calinon, S., D’halluin, F., Sauser, E. L., Caldwell, D. G., & Billard, A. G. (2010). Learning and reproduction of gestures by imitation. Robotics & Automation Magazine, IEEE, 17(2), 44–54.CrossRefGoogle Scholar
  8. Calinon, S., Kormushev, P., & Caldwell, D. G. (2013). Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning. Robotics and Autonomous Systems, 61(4), 369–379.CrossRefGoogle Scholar
  9. Daniel, C., Neumann, G., & Peters, J. (2012). Learning concurrent motor skills in versatile solution spaces. In Intelligent Robots and Systems (IROS), IEEE/RSJ International Conference on 2012. IEEE, (pp. 3591–3597).Google Scholar
  10. Farshidian, F., Neunert, M., & Buchli, J. (2014). Learning of closed-loop motion control. In IEEE International Conference on Intelligent Robots and Systems, no. IROS (pp. 1441–1446).Google Scholar
  11. Garabini, M., Passaglia, A., Belo, F., Salaris, P., & Bicchi, A. (2012). Optimality principles in stiffness control: The VSA kick. In IEEE International Conference on Robotics and Automation (pp. 3341–3346).Google Scholar
  12. Gribovskaya, E., Khansari-Zadeh, S. M., & Billard, A. (2010). Learning non-linear multivariate dynamics of motion in robotic manipulators. The International Journal of Robotics Research, 30(1), 80–117.CrossRefGoogle Scholar
  13. Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, 21(13), 1521–1544.Google Scholar
  14. Gullapalli, V., Franklin, J. A., & Benbrahim, H. (1994). Acquiring robot skills via reinforcement learning. Control Systems, IEEE, 14(1), 13–24.CrossRefGoogle Scholar
  15. Hogan, N. (1985). Impedance control: An approach to manipulation. Journal of Dynamic Systems Measurement and Control, 107(12), 1–24.CrossRefzbMATHGoogle Scholar
  16. Howard, M., Braun, D. J., & Vijayakumar, S. (2013). Transferring human impedance behavior to heterogeneous variable impedance actuators. IEEE Transactions on Robotics, 29(4), 847–862.CrossRefGoogle Scholar
  17. Ijspeert, A.J., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation, IEEE (Vol. 2, pp. 1398–1403).Google Scholar
  18. Khansari-Zadeh, S. M., & Billard, A. (2011). Learning stable non-linear dynamical systems with Gaussian Mixture Models. IEEE Transactions on Robotics, 27, 1–15.CrossRefGoogle Scholar
  19. Khansari-Zadeh, S. M., & Billard, A. (2011). Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 27(5), 943–957.CrossRefGoogle Scholar
  20. Kober, J., & Peters, J. (2009). Learning motor primitives for robotics. In IEEE International Conference on Robotics and Automation, 2009, ICRA’09, IEEE (pp. 2112–2118).Google Scholar
  21. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32, 1238–1274.CrossRefGoogle Scholar
  22. Kober, J., & Peters, J. (2010). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203.MathSciNetzbMATHGoogle Scholar
  23. Kober, J., & Peters, J. (2010). Imitation and reinforcement learning. IEEE Robotics Automation Magazine, 17(2), 55–62.CrossRefzbMATHGoogle Scholar
  24. Kronander, K., Khansari-Zadeh, S. M., & Billard, A. (2015). Incremental motion learning with locally modulated dynamical systems. Robotics and Autonomous Systems, 70, 52–62.CrossRefGoogle Scholar
  25. Kronander, K., & Billard, A. (2013). Learning compliant manipulation through kinesthetic and tactile human-robot interaction. Transactions on Haptics, 7(3), 1–16.Google Scholar
  26. Kronander, K., & Billard, A. (2016). Passive interaction control with dynamical systems. Robotics and Automation Letters, 1(1), 106–113.CrossRefGoogle Scholar
  27. Lee, A. X., Lu, H., Gupta, A., Levine, S., & Abbeel, P. (2015). Learning force-based manipulation of deformable objects from multiple demonstrations. In IEEE International Conference on Robotics and Automation.Google Scholar
  28. Lemme, A., Neumann, K., Reinhart, R., & Steil, J. (2014). Neural learning of vector fields for encoding stable dynamical systems. Neurocomputing, 141, 3–14.CrossRefGoogle Scholar
  29. Medina, J., Sieber, D., & Hirche, S. (2013). Risk-sensitive interaction control in uncertain manipulation tasks. In IEEE International Conference on Robotics and Automation.Google Scholar
  30. Mitrovic, D., Klanke, S., & Vijayakumar, S. (2011). Learning impedance control of antagonistic systems based on stochastic optimization principles. The International Journal of Robotics Research, 30(5), 556–573.CrossRefGoogle Scholar
  31. Paraschos, A., Daniel, C., Peters, J., & Neumann, G. (2013). Probabilistic movement primitives. Neural Information Processing Systems (pp. 1–9).Google Scholar
  32. Pastor, P., Righetti, L., Kalakrishnan, M., & Schaal, S. (2011). Online movement adaptation based on previous sensor experiences. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011. IEEE (pp. 365–371).Google Scholar
  33. Peters, J., & Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7–9), 1180–1190.CrossRefGoogle Scholar
  34. Rozo, L., Calinon, S., Caldwell, D., Jimenez, P., Torras, C., & Jiménez, P. (2013). Learning collaborative impedance-based robot behaviors. In AAAI Conference on Artificial Intelligence.Google Scholar
  35. Rückert, E. A., Neumann, G., Toussaint, M., & Maass, W. (2013). Learned graphical models for probabilistic planning provide a new class of movement primitives. Frontiers in Computational Neuroscience, 6(January), 1–20.Google Scholar
  36. Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 358(1431), 53747.CrossRefGoogle Scholar
  37. Selen, L. P. J., Franklin, D. W., & Wolpert, D. M. (2009). Impedance control reduces instability that arises from motor noise. The Journal of Neuroscience, 29(40), 1260616.CrossRefGoogle Scholar
  38. Stulp, F., & Sigaud, O. (2012). Policy improvement methods: Between black-box optimization and episodic reinforcement learning.Google Scholar
  39. Stulp, F., Sigaud, O. (2012). Path integral policy improvement with covariance matrix adaptation. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) (pp. 281–288).Google Scholar
  40. Sung, H.G. (2004). Gaussian mixture regression and classification (Ph.D. dissertation, Rice University).Google Scholar
  41. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Learning, 9(1), 1–23.Google Scholar
  42. Tedrake, R., Zhang, T. W., & Seung, H.S. (2004). Stochastic policy gradient reinforcement learning on a simple 3d biped. In Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings of IEEE/RSJ International Conference on 2004. IEEE (Vol. 3, pp. 2849–2854).Google Scholar
  43. Tee, K. P., Franklin, D. W., Kawato, M., Milner, T. E., Burdet, E., Peng, K., et al. (2010). Concurrent adaptation of force and impedance in the redundant muscle system. Biological Cybernetics, 102(1), 31–44.CrossRefzbMATHGoogle Scholar
  44. Theodorou, E., Buchli, J., & Schaal, S. (2010). A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11, 3137–3181.MathSciNetzbMATHGoogle Scholar
  45. Thijssen, S., & Kappen, H. (2015). Path integral control and state-dependent feedback. Physical Review E, 91(3), 032104.MathSciNetCrossRefGoogle Scholar
  46. Toussaint, M. (2009). Probabilistic inference as a model of planned behavior. Künstliche Intelligenz, 3(9), 23–29.Google Scholar
  47. Vlassis, N., Toussaint, M., Kontes, G., & Piperidis, S. (2009). Learning model-free robot control by a monte carlo EM algorithm. Autonomous Robots, 27(2), 123–130.CrossRefGoogle Scholar
  48. Yang, C., Ganesh, G., Haddadin, S., Parusel, S., Albu-Schaffer, A., & Burdet, E. (2011). Human-like adaptation of force and impedance in stable and unstable interactions. IEEE Transactions on Robotics, 27(5), 918–930.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Learning Algorithms ans Systems Laboratory (LASA)EPFLLausanneSwitzerland
  2. 2.Agile and Dexterous Robotics Lab (ADRL)ETHZZurichSwitzerland

Personalised recommendations