Abstract
Hierarchical Reinforcement Learning (HRL) represents a viable approach to learning complex tasks, especially those with an inner hierarchical structure. The HRL methods decompose the problem into a typically two-layered hierarchy. At the lower level, individual skills are created to solve specific non-trivial subtasks, such as locomotion primitives. The high-level agent can then use these skills as its actions, enabling it to tackle the overall task. The identification of an appropriate skill set, however, is a difficult problem by itself. Most current approaches solve it using a pre-training phase, in which skills are trained and fixed, before launching the training of the high-level agent. Having the skill set fixed prior to main training session can however impose flaws on the HRL system – especially if a useful skill was not successfully identified, and hence is missing from the skill set. Our Adaptive Skill Acquisition framework (ASA) aims specifically for these situations. It can be plugged onto existing HRL architectures and fix the defects within the pre-trained skill set. During the training of the high-level agent, ASA detects a missing skill, trains it, and integrates it into the existing system. In this paper, we present new improvements to the ASA framework, especially a new skill-training reward function, and support for skill-stopping functions enabling better integration. Furthermore, we extend our prior pilot tests into extensive experiments evaluating the functionality of ASA, in comparison to its theoretical boundaries. The source code of ASA is also available online (https://github.com/holasjuraj/asa).
This research was supported by KEGA grant no. 042UK-4/2019.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI Conference on Artificial Intelligence (2017)
Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning with subpolicies specializing for learned subgoals. In: International Conference on Neural Networks and Computational Intelligence, pp. 125–130 (2004)
Garage contributors: Garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage (2019)
Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. 13(1), 227–303 (2000)
Dillinger, V.: Abstract state space construction in hierarchical reinforcement learning. Ph.D. thesis, Comenius University in Bratislava (2019)
Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: International Conference on Learning Representations (2017)
Goel, S., Huber, M.: Subgoal discovery for hierarchical reinforcement learning using learned policies. In: Florida AI Research Society Conference, pp. 346–350 (2003)
Holas, J., Farkaš, I.: Adaptive skill acquisition in hierarchical reinforcement learning. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 383–394. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_31
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in Neural Information Processing Systems, pp. 1015–1023 (2009)
Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 3675–3683 (2016)
Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight. In: International Conference on Learning Representations (2018)
Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: International Conference on Learning Representations (2020)
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. Int. Conf. Mach. Learn. 1, 361–368 (2001)
McGovern, E.A., Barto, A.G.: Autonomous discovery of temporal abstractions from interaction with an environment. Ph.D. thesis, University of Massachusetts at Amherst (2002)
Menache, I., Mannor, S., Shimkin, N.: Q-cut—dynamic discovery of sub-goals in reinforcement learning. In: European Conference on Machine Learning, pp. 295–306 (2002)
Metzen, J.H., Kirchner, F.: Incremental learning of skill collections based on intrinsic motivation. Front. Neurorobotics 7, 11 (2013)
Moerman, W.: Hierarchical reinforcement learning: Assignment of behaviours to subpolicies by self-organization. Ph.D. thesis, Cognitive Artificial Intelligence, Utrecht University (2009)
Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3303–3313 (2018)
Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, pp. 1043–1049 (1998)
Robins, A.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Sci. 7(2), 123–146 (1995)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Shu, T., Xiong, C., Socher, R.: Hierarchical and interpretable skill acquisition in multi-task reinforcement learning. In: International Conference on Learning Representations (2018)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Holas, J., Farkaš, I. (2021). Advances in Adaptive Skill Acquisition. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-86380-7_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)