Abstract
Curriculum learning has the potential to solve the problem of sparse rewards, a long-standing challenge in reinforcement learning, with greater sample efficiency than traditional reinforcement learning algorithms because curriculum learning enables agents to learn tasks in a meaningful order: from simple tasks to difficult ones. However, most curriculum learning in RL still relies on fixed hand-designed sequences of tasks. We present a novel scheme of automatic curriculum learning for reinforcement learning agents. A two-level hierarchical reinforcement learning framework, with a high-level policy called the curriculum generator and a low-level policy called the action policy, is proposed. During training, the curriculum generator automatically proposes curricula for the action policy to learn. Our training methods guarantee that the proposed curricula are always moderately difficult for the action policy. Both levels of policies are trained simultaneously and independently. After training, the low-level policy will be able to finish all tasks without the instructions given by the curriculum generator. Experiment results on a wide range of benchmark robotics environments demonstrate that our method accelerates convergence considerably and improves the training quality compared with the method without the curriculum generator.
This work is supported by Shanghai Science and Technology Innovation Action Plan NO. 19511105900, and in part by the National Key Research and Development Project NO. 2018YFB1703201.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1171–1179 (2015)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Brockman, G., et al.: OpenAI gym (2016)
Dhariwal, P., et al.: OpenAI baselines (2017). https://github.com/openai/baselines
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1311–1320. JMLR.org (2017)
Guo, X., Singh, S., Lewis, R., Lee, H.: Deep learning for reward design to improve Monte Carlo tree search in ATARI games. arXiv preprint arXiv:1604.07095 (2016)
Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6244–6251. IEEE (2018)
Held, D., Geng, X., Florensa, C., Abbeel, P.: Automatic goal generation for reinforcement learning agents (2018)
Karpathy, A., van de Panne, M.: Curriculum learning for motor skills. In: Kosseim, L., Inkpen, D. (eds.) AI 2012. LNCS (LNAI), vol. 7310, pp. 325–330. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30353-1_31
Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in Neural Information Processing Systems, pp. 1015–1023 (2009)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Machine Learn. Res. 17(1), 1334–1373 (2016)
Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mahmood, A.R., Korenkevych, D., Komer, B.J., Bergstra, J.: Setting up a reinforcement learning task with a real-world robot. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4635–4640. IEEE (2018)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015)
Schmidhuber, J.: Learning to generate sub-goals for action sequences. In: Artificial Neural Networks, pp. 967–972 (1991)
Shalev-Shwartz, S., Ben-Zrihem, N., Cohen, A., Shashua, A.: Long-term planning by short-term prediction. arXiv preprint arXiv:1602.01580 (2016)
Sharma, S., Ravindran, B.: Online multi-task learning using active sampling (2017)
Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH Computer Graphics, vol. 19, pp. 245–254. ACM (1985)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 1–5 (2019)
Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
He, Z., Gu, C., Xu, R., Wu, K. (2020). Automatic Curriculum Generation by Hierarchical Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)