KI - Künstliche Intelligenz

, Volume 29, Issue 4, pp 369–377 | Cite as

Accounting for Task-Difficulty in Active Multi-Task Robot Control Learning

  • Alexander Fabisch
  • Jan Hendrik Metzen
  • Mario Michael Krell
  • Frank Kirchner
Technical Contribution


Contextual policy search is a reinforcement learning approach for multi-task learning in the context of robot control learning. It can be used to learn versatilely applicable skills that generalize over a range of tasks specified by a context vector. In this work, we combine contextual policy search with ideas from active learning for selecting the task in which the next trial will be performed. Moreover, we use active training set selection for reducing detrimental effects of exploration in the sampling policy. A core challenge in this approach is that the distribution of the obtained rewards may not be directly comparable between different tasks. We propose the novel approach PUBSVE for estimating a reward baseline and investigate empirically on benchmark problems and simulated robotic tasks to which extent this method can remedy the issue of non-comparable reward.


Contextual policy search Multi-task learning Active learning 



This work was supported through two Grants of the German Federal Ministry of Economics and Technology (BMWi, FKZ 50 RA 1216 and FKZ 50 RA 1217).


  1. 1.
    Deisenroth MP, Neumann G, Peters J (2013) A survey on policy search for robotics. Found Trends Robot 2(1–2):328–373Google Scholar
  2. 2.
    Fabisch A, Metzen JH (2014) Active contextual policy search. J Mach Learn Res 15:3371–3399MathSciNetzbMATHGoogle Scholar
  3. 3.
    Hansen N, Auger A, Ros R, Finck S, Posik P (2010) Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In: Proceedings of the 12th annual conference companion on genetic and evolutionary computationGoogle Scholar
  4. 4.
    Ijspeert AJ, Nakanishi J, Hoffmann H, Pastor P, Schaal S (2013) Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput 25(2):328–373MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1–2):171–203MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379CrossRefGoogle Scholar
  7. 7.
    Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. Ph.D. thesis, University of Bremen, BremenGoogle Scholar
  8. 8.
    Kupcsik AG, Deisenroth MP, Peters J, Neumann G (2013) Data-efficient generalization of robot skills with contextual policy search. In: Proceedings of the national conference on artificial intelligence (AAAI)Google Scholar
  9. 9.
    Mangasarian OL, Musicant DR (1998) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037CrossRefGoogle Scholar
  10. 10.
    Manz M, Sonsalla R, Hilljegerdes J, Oekermann C, Schwendner J, Bartsch S, Ptacek S (2014) Design of a rover for mobile manipulation in uneven terrain in the context of the spacebot cup. In: Proceedings of the international symposium on artificial intelligence, robotics and automation in space (i-SAIRAS 2014).
  11. 11.
    Neumann G (2011) Variational inference for policy search in changing situations. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 817–824Google Scholar
  12. 12.
    Peters J, Mülling K, Altun Y (2010) Relative entropy policy search. In: Long DP, Fox M (ed) Proceedings of the twenty-fourth AAAI conference on artificial intelligence. AAAI Press, Atlanta, pp 1607–1612Google Scholar
  13. 13.
    Ring MB (1997) CHILD: a first step towards continual learning. Mach Learn 28(1):77–104CrossRefzbMATHGoogle Scholar
  14. 14.
    Ruvolo P, Eaton E (2013) Active task selection for lifelong machine learning. In: Twenty-seventh AAAI conference on artificial intelligenceGoogle Scholar
  15. 15.
    da Silva BC, Konidaris G, Barto A (2014) Active learning of parameterized skills. In: Proceedings of the 31st international conference on machine learning (ICML 2014). Beijing, ChinaGoogle Scholar
  16. 16.
    da Silva BC, Konidaris G, Barto AG (2012) Learning parameterized skills. In: Proceedings of the 29th international conference on machine learning (ICML 2012). Edinburgh, ScotlandGoogle Scholar
  17. 17.
    Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium seriesGoogle Scholar
  18. 18.
    Steinwart I, Hush D, Scovel C (2009) Training SVMs without offset. J Mach Learn Res 12:141–202MathSciNetGoogle Scholar
  19. 19.
    Sutton RS, Koop A, Silver D (2007) On the role of tracking in stationary environments. In: Proceedings of the 24th international conference on machine learning. ACM, pp 871–878Google Scholar
  20. 20.
    Syed NA, Liu H, Sung KK (1999) Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the 5th international conference on knowledge discovery and data mining—KDD ’99. ACM Press, New York, pp 317–321Google Scholar
  21. 21.
    Thrun S (1996) Is learning the n-th thing any easier than learning the first? In: Advances in neural information processing systems. The MIT Press, pp 640–646Google Scholar
  22. 22.
    Thrun S, Mitchell TM (1995) Lifelong robot learning. In: Steels L (ed) The biology and technology of intelligent autonomous agents, vol 144. Springer, Berlin, pp 165–196Google Scholar
  23. 23.
    Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Advances in neural information processing systems. vol 13. MIT Press, pp 682–688Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Alexander Fabisch
    • 1
  • Jan Hendrik Metzen
    • 1
    • 2
  • Mario Michael Krell
    • 1
  • Frank Kirchner
    • 1
    • 2
  1. 1.Robotics Group, Universität BremenBremenGermany
  2. 2.Robotics Innovation Center, German Research Center for Artificial Intelligence (DFKI)BremenGermany

Personalised recommendations