A Comparison of Policy Search in Joint Space and Cartesian Space for Refinement of Skills

  • Alexander FabischEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 980)


Imitation learning is a way to teach robots skills that are demonstrated by humans. Transfering skills between these different kinematic structures seems to be straightforward in Cartesian space. Because of the correspondence problem, however, the result will most likely not be identical. This is why refinement is required, for example, by policy search. Policy search in Cartesian space is prone to reachability problems when using conventional inverse kinematic solvers. We propose a configurable approximate inverse kinematic solver and show that it can accelerate the refinement process considerably. We also compare empirically refinement in Cartesian space and refinement in joint space.


Learning from demonstration Imitation learning Reinforcement learning Policy search Inverse kinematics 



We thank Manuel Meder for support in setting up simulation environments. This work was supported through grants of the German Federal Ministry of Economics and Technology (BMWi, FKZ 50RA1216 and 50RA1217), the German Federal Ministry for Economic Affairs and Energy (BMWi, FKZ 50RA1701), and the European Union’s Horizon 2020 research and innovation program (No H2020-FOF 2016 723853).


  1. 1.
    Beeson, P., Ames, B.: TRAC-IK: an open-source library for improved solving of generic inverse kinematics. In: Humanoids, pp. 928–935 (2015)Google Scholar
  2. 2.
    Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 328–373 (2013)Google Scholar
  3. 3.
    Gutzeit, L., Fabisch, A., Otto, M., Metzen, J.H., Hansen, J., Kirchner, F., Kirchner, E.A.: The besman learning platform for automated robot skill learning. Front. Roboti. AI 5, 43 (2018)CrossRefGoogle Scholar
  4. 4.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9, 159–195 (2001)CrossRefGoogle Scholar
  5. 5.
    Heidrich-Meisner, V., Igel, C.: Evolution strategies for direct policy search. In: Parallel Problem Solving from Nature PPSN X, pp. 428–437 (2008)CrossRefGoogle Scholar
  6. 6.
    Henning, A.: Approximate inverse kinematics using a database. Technical report, Worcester Polytechnic Institute (2014)Google Scholar
  7. 7.
    Hildebrandt, M., Albiez, J., Kirchner, F.: Computer-based control of deep-sea manipulators. In: OCEANS 2008, pp. 1–6 (2008)Google Scholar
  8. 8.
    Huynh, D.Q.: Metrics for 3D rotations: comparison and analysis. J. Math. Imaging Vis. 35(2), 155–164 (2009)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRefGoogle Scholar
  10. 10.
    Kober, J., Peters, J.: Policy search for motor primitives in robotics. Mach. Learn. (2010)Google Scholar
  11. 11.
    Kormushev, P., Calinon, S., Caldwell, D.G.: Robot motor skill coordination with EM-based reinforcement learning. In: IROS, pp. 3232–3237 (2010)Google Scholar
  12. 12.
    Kumar, S., Sukavanam, N., Raman, B.: An optimization approach to solve the inverse kinematics of redundant manipulator. Int. J. Inf. Syst. Sci. 6, 414–423 (2010)MathSciNetGoogle Scholar
  13. 13.
    Lemburg, J., de Gea Fernndez, J., Eich, M., Mronga, D., Kampmann, P., Vogt, A., Aggarwal, A., Shi, Y., Kirchner, F.: AILA - design of an autonomous mobile dual-arm robot. In: ICRA, pp. 5147–5153 (2011)Google Scholar
  14. 14.
    Fallon, M., et al.: An architecture for online affordance-based perception and whole-body planning. J. Field Robot. 32(2), 229–254 (2015)CrossRefGoogle Scholar
  15. 15.
    Krüger, N., et al.: Technologies for the fast set-up of automated assembly processes. Künstliche Intelligenz 28(4), 305–313 (2014)CrossRefGoogle Scholar
  16. 16.
    Neumann, G.: Variational inference for policy search in changing situations. In: ICML, pp. 817–824 (2011)Google Scholar
  17. 17.
    Nilsson, R.: Inverse kinematics. Master’s thesis, Luleå University of Technology (2009)Google Scholar
  18. 18.
    Pastor, P., Hoffmann, H., Asfour, T., Schaal, S.: Learning and generalization of motor skills by learning from demonstration. In: Robotics and Automation (ICRA), pp. 763–768 (2009)Google Scholar
  19. 19.
    Peters, J., Mülling, K., Altun, Y.: Relative entropy policy search. In: AAAI (2010)Google Scholar
  20. 20.
    Peters, J., Schaal, S.: Policy gradient methods for robotics. In: IROS (2006)Google Scholar
  21. 21.
    Peters, J., Schaal, S.: Reinforcement learning by reward-weighted regression for operational space control. In: ICML, pp. 745–750 (2007)Google Scholar
  22. 22.
    Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. J. Mach. Learn. Res. 11, 3137–3181 (2010)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Ude, A., Nemec, B., Petri, T., Morimoto, J.: Orientation in Cartesian space dynamic movement primitives. In: Robotics and Automation (ICRA), pp. 2997–3004 (2014)Google Scholar
  24. 24.
    Zacharias, F., Borst, C., Hirzinger, G.: Capturing robot workspace structure: representing robot capabilities. In: IROS, pp. 3229–3236 (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.DFKI GmbH, Robotics Innovation CenterBremenGermany

Personalised recommendations