Abstract
Exploration is a major challenge in deep reinforcement learning, especially in cases where reward is sparse. Simple random exploration strategies, such as \(\epsilon \)-greedy, struggle to solve the hard exploration problem in the sparse reward environment. A more effective approach to solve the hard exploration problem in the sparse reward environment is to use an exploration strategy based on intrinsic motivation, where the key point is to design reasonable and effective intrinsic reward to drive the agent to explore. This paper proposes a method called CEMP, which drives the agent to explore more effectively and continuously in the sparse reward environment. CEMP contributes a new framework for designing intrinsic reward from multiple perspectives, and can be easily integrated into various existing reinforcement learning algorithms. In addition, experimental results in a series of complex and sparse reward environments in MiniGrid demonstrate that our proposed CEMP method achieves better final performance and faster learning efficiency than ICM, RIDE, and TRPO-AE-Hash, which only calculate intrinsic reward from a single perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-determination in Human Behavior. Springer, Cham (2013). https://doi.org/10.1007/978-1-4899-2271-7
Ostrovski, G., Bellemare, M.G., Oord, A., Munos, R.: Count-based exploration with neural density models. In: International Conference on Machine Learning, pp. 2721–2730. PMLR (2017)
Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. arXiv preprint arXiv:1706.08090 (2017)
Tang, H., et al.: # exploration: A study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388 (2002)
Choshen, L., Fox, L., Loewenstein, Y.: Dora the explorer: directed outreaching reinforcement action-selection. arXiv preprint arXiv:1804.04012 (2018)
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018)
Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814 (2015)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Pathak, D., Gandhi, D., Gupta, A.: Self-supervised exploration via disagreement. In: International Conference on Machine Learning, pp. 5062–5071. PMLR (2019)
Stanton, C., Clune, J.: Deep curiosity search: intra-life exploration improves performance on challenging deep reinforcement learning problems. corr abs/1806.00553 (2018) (1806)
Savinov, N., et al.: Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274 (2018)
Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292 (2020)
Acknowledgments
This work was supported by the National Defense Science and Technology Foundation Reinforcement Program and the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No.XDA27041001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, Z., Guan, Q. (2024). Continuous Exploration via Multiple Perspectives in Sparse Reward Environment. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14427. Springer, Singapore. https://doi.org/10.1007/978-981-99-8435-0_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-8435-0_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8434-3
Online ISBN: 978-981-99-8435-0
eBook Packages: Computer ScienceComputer Science (R0)