Abstract
Contextual policy search is a reinforcement learning approach for multi-task learning in the context of robot control learning. It can be used to learn versatilely applicable skills that generalize over a range of tasks specified by a context vector. In this work, we combine contextual policy search with ideas from active learning for selecting the task in which the next trial will be performed. Moreover, we use active training set selection for reducing detrimental effects of exploration in the sampling policy. A core challenge in this approach is that the distribution of the obtained rewards may not be directly comparable between different tasks. We propose the novel approach PUBSVE for estimating a reward baseline and investigate empirically on benchmark problems and simulated robotic tasks to which extent this method can remedy the issue of non-comparable reward.
Similar content being viewed by others
References
Deisenroth MP, Neumann G, Peters J (2013) A survey on policy search for robotics. Found Trends Robot 2(1–2):328–373
Fabisch A, Metzen JH (2014) Active contextual policy search. J Mach Learn Res 15:3371–3399
Hansen N, Auger A, Ros R, Finck S, Posik P (2010) Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In: Proceedings of the 12th annual conference companion on genetic and evolutionary computation
Ijspeert AJ, Nakanishi J, Hoffmann H, Pastor P, Schaal S (2013) Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput 25(2):328–373
Kober J, Peters J (2011) Policy search for motor primitives in robotics. Mach Learn 84(1–2):171–203
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379
Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. Ph.D. thesis, University of Bremen, Bremen
Kupcsik AG, Deisenroth MP, Peters J, Neumann G (2013) Data-efficient generalization of robot skills with contextual policy search. In: Proceedings of the national conference on artificial intelligence (AAAI)
Mangasarian OL, Musicant DR (1998) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037
Manz M, Sonsalla R, Hilljegerdes J, Oekermann C, Schwendner J, Bartsch S, Ptacek S (2014) Design of a rover for mobile manipulation in uneven terrain in the context of the spacebot cup. In: Proceedings of the international symposium on artificial intelligence, robotics and automation in space (i-SAIRAS 2014). http://spacebot.dfki-bremen.de/
Neumann G (2011) Variational inference for policy search in changing situations. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 817–824
Peters J, Mülling K, Altun Y (2010) Relative entropy policy search. In: Long DP, Fox M (ed) Proceedings of the twenty-fourth AAAI conference on artificial intelligence. AAAI Press, Atlanta, pp 1607–1612
Ring MB (1997) CHILD: a first step towards continual learning. Mach Learn 28(1):77–104
Ruvolo P, Eaton E (2013) Active task selection for lifelong machine learning. In: Twenty-seventh AAAI conference on artificial intelligence
da Silva BC, Konidaris G, Barto A (2014) Active learning of parameterized skills. In: Proceedings of the 31st international conference on machine learning (ICML 2014). Beijing, China
da Silva BC, Konidaris G, Barto AG (2012) Learning parameterized skills. In: Proceedings of the 29th international conference on machine learning (ICML 2012). Edinburgh, Scotland
Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium series
Steinwart I, Hush D, Scovel C (2009) Training SVMs without offset. J Mach Learn Res 12:141–202
Sutton RS, Koop A, Silver D (2007) On the role of tracking in stationary environments. In: Proceedings of the 24th international conference on machine learning. ACM, pp 871–878
Syed NA, Liu H, Sung KK (1999) Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the 5th international conference on knowledge discovery and data mining—KDD ’99. ACM Press, New York, pp 317–321
Thrun S (1996) Is learning the n-th thing any easier than learning the first? In: Advances in neural information processing systems. The MIT Press, pp 640–646
Thrun S, Mitchell TM (1995) Lifelong robot learning. In: Steels L (ed) The biology and technology of intelligent autonomous agents, vol 144. Springer, Berlin, pp 165–196
Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Advances in neural information processing systems. vol 13. MIT Press, pp 682–688
Acknowledgments
This work was supported through two Grants of the German Federal Ministry of Economics and Technology (BMWi, FKZ 50 RA 1216 and FKZ 50 RA 1217).
Author information
Authors and Affiliations
Corresponding author
Additional information
A. Fabisch and J. H. Metzen contributed equally.
Rights and permissions
About this article
Cite this article
Fabisch, A., Metzen, J.H., Krell, M.M. et al. Accounting for Task-Difficulty in Active Multi-Task Robot Control Learning. Künstl Intell 29, 369–377 (2015). https://doi.org/10.1007/s13218-015-0363-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-015-0363-2