Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

Chen, Zhongpeng; Guan, Qiang

doi:10.1007/978-981-99-8435-0_5

Zhongpeng Chen^15,16 &
Qiang Guan¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14427))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

477 Accesses

Abstract

Exploration is a major challenge in deep reinforcement learning, especially in cases where reward is sparse. Simple random exploration strategies, such as \(\epsilon \)-greedy, struggle to solve the hard exploration problem in the sparse reward environment. A more effective approach to solve the hard exploration problem in the sparse reward environment is to use an exploration strategy based on intrinsic motivation, where the key point is to design reasonable and effective intrinsic reward to drive the agent to explore. This paper proposes a method called CEMP, which drives the agent to explore more effectively and continuously in the sparse reward environment. CEMP contributes a new framework for designing intrinsic reward from multiple perspectives, and can be easily integrated into various existing reinforcement learning algorithms. In addition, experimental results in a series of complex and sparse reward environments in MiniGrid demonstrate that our proposed CEMP method achieves better final performance and faster learning efficiency than ICM, RIDE, and TRPO-AE-Hash, which only calculate intrinsic reward from a single perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

Intrinsically Motivated Lifelong Exploration in Reinforcement Learning

Fast and slow curiosity for high-level exploration in reinforcement learning

Article Open access 16 September 2020

References

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-determination in Human Behavior. Springer, Cham (2013). https://doi.org/10.1007/978-1-4899-2271-7
Book Google Scholar
Ostrovski, G., Bellemare, M.G., Oord, A., Munos, R.: Count-based exploration with neural density models. In: International Conference on Machine Learning, pp. 2721–2730. PMLR (2017)
Google Scholar
Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. arXiv preprint arXiv:1706.08090 (2017)
Tang, H., et al.: # exploration: A study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Google Scholar
Choshen, L., Fox, L., Loewenstein, Y.: Dora the explorer: directed outreaching reinforcement action-selection. arXiv preprint arXiv:1804.04012 (2018)
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)
Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814 (2015)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Pathak, D., Gandhi, D., Gupta, A.: Self-supervised exploration via disagreement. In: International Conference on Machine Learning, pp. 5062–5071. PMLR (2019)
Google Scholar
Stanton, C., Clune, J.: Deep curiosity search: intra-life exploration improves performance on challenging deep reinforcement learning problems. corr abs/1806.00553 (2018) (1806)
Google Scholar
Savinov, N., et al.: Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274 (2018)
Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292 (2020)

Download references

Acknowledgments

This work was supported by the National Defense Science and Technology Foundation Reinforcement Program and the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No.XDA27041001.

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Zhongpeng Chen
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Zhongpeng Chen & Qiang Guan

Authors

Zhongpeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiang Guan .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Guan, Q. (2024). Continuous Exploration via Multiple Perspectives in Sparse Reward Environment. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14427. Springer, Singapore. https://doi.org/10.1007/978-981-99-8435-0_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-8435-0_5
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8434-3
Online ISBN: 978-981-99-8435-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

Abstract

Access this chapter

Similar content being viewed by others

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

Intrinsically Motivated Lifelong Exploration in Reinforcement Learning

Fast and slow curiosity for high-level exploration in reinforcement learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

Abstract

Access this chapter

Similar content being viewed by others

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

Intrinsically Motivated Lifelong Exploration in Reinforcement Learning

Fast and slow curiosity for high-level exploration in reinforcement learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation