Abstract
It is a critical challenge for model-free reinforcement learning to explore in sparse reward environments with advantage. Many current state-of-the-art approaches take the approach of designing intrinsic rewards to encourage exploration. However, when there are multiple new domains to explore in the environment, many approaches usually focus on one domain and do not fully explore other domains. In this paper, we propose a new intrinsic reward approach—critical path-driven exploration (CPDE), which promotes exploration by encouraging agent to take more favorable actions by paying extra attention to some key points during the process of reaching a goal point. We evaluated it by performing it in several challenging procedural generation tasks in MiniGrid. Our experiments show that this approach performs better than existing exploration methods in the MiniGrid environment. We analyzed the intrinsic rewards received by our agent, and our intrinsic reward approach caused the agent to take more favorable actions compared to previous approaches, thus increasing the speed of exploration in a sparse reward environment.
Similar content being viewed by others
References
Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inform Process Syst. 29
Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2019) Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995
Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning, pp 2778–2787. PMLR
Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros AA (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937. PMLR
Bai C, Liu P, Liu K, Wang L, Zhao Y, Han L, Wang Z (2021) Variational dynamic for self-supervised exploration in deep reinforcement learning. IEEE Transactions on neural networks and learning systems
Rajeswaran A, Lowrey K, Todorov EV, Kakade SM (2017) Towards generalization and simplicity in continuous control. Adv Neural Inform Process Syst. 30
Zhang C, Vinyals O, Munos R, Bengio S (2018) A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893
Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2019) Quantifying generalization in reinforcement learning. In: International conference on machine learning, pp 1282–1289. PMLR
Juliani A, Khalifa A, Berges V-P, Harper J, Teng E, Henry H, Crespi A, Togelius J, Lange D (2019) Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378
Raileanu R, Rocktäschel T (2020) Ride: Rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292
Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv preprint arXiv:1810.12894
Kim H, Kim J, Jeong Y, Levine S, Song HO (2018) Emi: Exploration with mutual information. arXiv preprint arXiv:1810.01176
Song Y, Chen Y, Hu Y, Fan C (2020) Exploring unknown states with action balance. arXiv preprint arXiv:2003.04518
Kim Y, Nam W, Kim H, Kim JH, Kim G (2019) Curiosity-bottleneck: Exploration by distilling task-specific novelty. In: International conference on machine learning, pp 3379–3388. PMLR
Ostrovski G, Bellemare MG, Oord A, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, pp 2721–2730. PMLR
Tao RY, François-Lavet V, Pineau J (2020) Novelty search in representational space for sample efficient exploration. Adv Neural Inform Process Syst 33:8114–8126
Kang Y, Zhao E, Li K, Xing J (2021) Exploration via state influence modeling. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 8047–8054
Chevalier-Boisvert M, Willems L, Pal S (2018) Minimalistic gridworld environment for openai gym
Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I, et al. (2018) Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In: International conference on machine learning, pp 1407–1416. PMLR
Acknowledgements
We thank colleagues for many helpful discussions and suggestions on the research and the paper.
Funding
None.
Author information
Authors and Affiliations
Contributions
All authors have contributed to the research and the paper. YK and YD designed the idea of the paper. YD constructed the parts of networks. YK and YD wrote the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest/competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kong, Y., Dou, Y. Critical path-driven exploration on the MiniGrid environment. SOCA 17, 303–308 (2023). https://doi.org/10.1007/s11761-023-00365-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-023-00365-9