Automatic Curriculum Generation by Hierarchical Reinforcement Learning

He, Zhenghua; Gu, Chaochen; Xu, Rui; Wu, Kaijie

doi:10.1007/978-3-030-63833-7_17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2541 Accesses
1 Citations

Abstract

Curriculum learning has the potential to solve the problem of sparse rewards, a long-standing challenge in reinforcement learning, with greater sample efficiency than traditional reinforcement learning algorithms because curriculum learning enables agents to learn tasks in a meaningful order: from simple tasks to difficult ones. However, most curriculum learning in RL still relies on fixed hand-designed sequences of tasks. We present a novel scheme of automatic curriculum learning for reinforcement learning agents. A two-level hierarchical reinforcement learning framework, with a high-level policy called the curriculum generator and a low-level policy called the action policy, is proposed. During training, the curriculum generator automatically proposes curricula for the action policy to learn. Our training methods guarantee that the proposed curricula are always moderately difficult for the action policy. Both levels of policies are trained simultaneously and independently. After training, the low-level policy will be able to finish all tasks without the instructions given by the curriculum generator. Experiment results on a wide range of benchmark robotics environments demonstrate that our method accelerates convergence considerably and improves the training quality compared with the method without the curriculum generator.

This work is supported by Shanghai Science and Technology Innovation Action Plan NO. 19511105900, and in part by the National Key Research and Development Project NO. 2018YFB1703201.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrychowicz, M., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
Google Scholar
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1171–1179 (2015)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Google Scholar
Brockman, G., et al.: OpenAI gym (2016)
Google Scholar
Dhariwal, P., et al.: OpenAI baselines (2017). https://github.com/openai/baselines
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)
Article Google Scholar
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1311–1320. JMLR.org (2017)
Google Scholar
Guo, X., Singh, S., Lewis, R., Lee, H.: Deep learning for reward design to improve Monte Carlo tree search in ATARI games. arXiv preprint arXiv:1604.07095 (2016)
Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6244–6251. IEEE (2018)
Google Scholar
Held, D., Geng, X., Florensa, C., Abbeel, P.: Automatic goal generation for reinforcement learning agents (2018)
Google Scholar
Karpathy, A., van de Panne, M.: Curriculum learning for motor skills. In: Kosseim, L., Inkpen, D. (eds.) AI 2012. LNCS (LNAI), vol. 7310, pp. 325–330. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30353-1_31
Chapter Google Scholar
Konidaris, G., Barto, A.G.: Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in Neural Information Processing Systems, pp. 1015–1023 (2009)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Machine Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight (2018)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mahmood, A.R., Korenkevych, D., Komer, B.J., Bergstra, J.: Setting up a reinforcement learning task with a real-world robot. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4635–4640. IEEE (2018)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320 (2015)
Google Scholar
Schmidhuber, J.: Learning to generate sub-goals for action sequences. In: Artificial Neural Networks, pp. 967–972 (1991)
Google Scholar
Shalev-Shwartz, S., Ben-Zrihem, N., Cohen, A., Shashua, A.: Long-term planning by short-term prediction. arXiv preprint arXiv:1602.01580 (2016)
Sharma, S., Ravindran, B.: Online multi-task learning using active sampling (2017)
Google Scholar
Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH Computer Graphics, vol. 19, pp. 245–254. ACM (1985)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Article Google Scholar
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 1–5 (2019)
Article Google Scholar
Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)

Download references

Author information

Authors and Affiliations

Key Laboratory of System Control and Information Processing, MOE of China, Shanghai Jiao Tong University, Shanghai, China
Zhenghua He, Chaochen Gu, Rui Xu & Kaijie Wu

Authors

Zhenghua He
View author publications
You can also search for this author in PubMed Google Scholar
Chaochen Gu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kaijie Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaochen Gu .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Z., Gu, C., Xu, R., Wu, K. (2020). Automatic Curriculum Generation by Hierarchical Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_17
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics