Exploration via Progress-Driven Intrinsic Rewards

Bougie, Nicolas; Ichise, Ryutaro

doi:10.1007/978-3-030-61616-8_22

Nicolas Bougie^11,12 &
Ryutaro Ichise^11,12

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2292 Accesses
2 Citations

Abstract

Traditional exploration methods in reinforcement learning rely on well-designed extrinsic rewards. However, many real-world scenarios involve sparse or delayed rewards. One solution inspired by curious behaviors in animals is to let the agent develop its own intrinsic rewards. In this paper we propose a novel end-to-end curiosity mechanism which uses learning progress as novelty bonus. We compare a policy-based and a visual-based progress bonus to move the agent towards hard-to-learn regions of the state space. We further leverage the agent’s learning to identify the most critical regions, which results in more sample-efficient and global exploration strategies. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros., and Atari games. Experimental results show that our method outperforms prior approaches in most tasks in terms of exploration efficiency and average scores, especially for those featuring high-level exploration patterns or with deceptive rewards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. JAIR 47, 253–279 (2013)
Article Google Scholar
Bougie, N., Ichise, R.: Skill-based curiosity for intrinsically motivated reinforcement learning. Mach. Learn. 109(3), 493–512 (2019). https://doi.org/10.1007/s10994-019-05845-8
Article MathSciNet MATH Google Scholar
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint:1810.12894 (2018)
Google Scholar
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for openAI gym (2018). https://github.com/maximecb/gym-minigrid
Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. arXiv preprint:1705.06366 (2017)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint:1801.01290 (2018)
Google Scholar
Hong, Z.W., Shann, T.Y., Su, S.Y., Chang, Y.H., Fu, T.J., Lee, C.Y.: Diversity-driven exploration strategy for deep reinforcement learning. In: NIPS (2018)
Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: Variational information maximizing exploration. In: NIPS, pp. 1109–1117 (2016)
Google Scholar
Kaelbling, L.P.: Learning to achieve goals. In: IJCAI, pp. 1094–1098 (1993)
Google Scholar
Kauten, C.: Super mario bros for openAI gym. https://github.com/Kautenja/gym-super-mario-bros (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint:1509.02971 (2015)
Google Scholar
Machado, M., Bellemare, M., Bowling, M.: Count-based exploration with the successor representation. arXiv preprint:1807.11622 (2018)
Google Scholar
Mnih, V., et al.: Asynchronous methods for DRL. In: ICML, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Nair, A.V., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. In: ICML, pp. 9191–9200 (2018)
Google Scholar
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: ICML, pp. 2721–2730 (2017)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML (2017)
Google Scholar
Savinov, N., et al.: Episodic curiosity through reachability. In: ICLR (2019)
Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: Proceedings of the International conference on Machine Learning (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint:1707.06347 (2017)
Google Scholar
Stanton, C., Clune, J.: Deep curiosity search: intra-life exploration improves performance on challenging deep reinforcement learning problems. In: ICML (2019)
Google Scholar
Tang, H., et al.: Exploration: a study of count-based exploration for deep reinforcement learning. In: NIPS, pp. 2753–2762 (2017)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

The Graduate University for Advanced Studies, Sokendai, Tokyo, Japan
Nicolas Bougie & Ryutaro Ichise
National Institute of Informatics, Tokyo, Japan
Nicolas Bougie & Ryutaro Ichise

Authors

Nicolas Bougie
View author publications
You can also search for this author in PubMed Google Scholar
Ryutaro Ichise
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Bougie .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bougie, N., Ichise, R. (2020). Exploration via Progress-Driven Intrinsic Rewards. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_22
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics