Skip to main content

Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14427))

Included in the following conference series:

  • 477 Accesses

Abstract

Exploration is a major challenge in deep reinforcement learning, especially in cases where reward is sparse. Simple random exploration strategies, such as \(\epsilon \)-greedy, struggle to solve the hard exploration problem in the sparse reward environment. A more effective approach to solve the hard exploration problem in the sparse reward environment is to use an exploration strategy based on intrinsic motivation, where the key point is to design reasonable and effective intrinsic reward to drive the agent to explore. This paper proposes a method called CEMP, which drives the agent to explore more effectively and continuously in the sparse reward environment. CEMP contributes a new framework for designing intrinsic reward from multiple perspectives, and can be easily integrated into various existing reinforcement learning algorithms. In addition, experimental results in a series of complex and sparse reward environments in MiniGrid demonstrate that our proposed CEMP method achieves better final performance and faster learning efficiency than ICM, RIDE, and TRPO-AE-Hash, which only calculate intrinsic reward from a single perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Google Scholar 

  2. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  3. Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-determination in Human Behavior. Springer, Cham (2013). https://doi.org/10.1007/978-1-4899-2271-7

    Book  Google Scholar 

  4. Ostrovski, G., Bellemare, M.G., Oord, A., Munos, R.: Count-based exploration with neural density models. In: International Conference on Machine Learning, pp. 2721–2730. PMLR (2017)

    Google Scholar 

  5. Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. arXiv preprint arXiv:1706.08090 (2017)

  6. Tang, H., et al.: # exploration: A study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  7. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388 (2002)

    Google Scholar 

  8. Choshen, L., Fox, L., Loewenstein, Y.: Dora the explorer: directed outreaching reinforcement action-selection. arXiv preprint arXiv:1804.04012 (2018)

  9. Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)

  10. Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814 (2015)

  11. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)

    Google Scholar 

  12. Pathak, D., Gandhi, D., Gupta, A.: Self-supervised exploration via disagreement. In: International Conference on Machine Learning, pp. 5062–5071. PMLR (2019)

    Google Scholar 

  13. Stanton, C., Clune, J.: Deep curiosity search: intra-life exploration improves performance on challenging deep reinforcement learning problems. corr abs/1806.00553 (2018) (1806)

    Google Scholar 

  14. Savinov, N., et al.: Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274 (2018)

  15. Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292 (2020)

Download references

Acknowledgments

This work was supported by the National Defense Science and Technology Foundation Reinforcement Program and the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No.XDA27041001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Guan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z., Guan, Q. (2024). Continuous Exploration via Multiple Perspectives in Sparse Reward Environment. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14427. Springer, Singapore. https://doi.org/10.1007/978-981-99-8435-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8435-0_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8434-3

  • Online ISBN: 978-981-99-8435-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics