Skip to main content
Log in

Critical path-driven exploration on the MiniGrid environment

  • Special Issue Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

It is a critical challenge for model-free reinforcement learning to explore in sparse reward environments with advantage. Many current state-of-the-art approaches take the approach of designing intrinsic rewards to encourage exploration. However, when there are multiple new domains to explore in the environment, many approaches usually focus on one domain and do not fully explore other domains. In this paper, we propose a new intrinsic reward approach—critical path-driven exploration (CPDE), which promotes exploration by encouraging agent to take more favorable actions by paying extra attention to some key points during the process of reaching a goal point. We evaluated it by performing it in several challenging procedural generation tasks in MiniGrid. Our experiments show that this approach performs better than existing exploration methods in the MiniGrid environment. We analyzed the intrinsic rewards received by our agent, and our intrinsic reward approach caused the agent to take more favorable actions compared to previous approaches, thus increasing the speed of exploration in a sparse reward environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inform Process Syst. 29

  2. Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2019) Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995

  3. Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning, pp 2778–2787. PMLR

  4. Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros AA (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355

  5. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937. PMLR

  6. Bai C, Liu P, Liu K, Wang L, Zhao Y, Han L, Wang Z (2021) Variational dynamic for self-supervised exploration in deep reinforcement learning. IEEE Transactions on neural networks and learning systems

  7. Rajeswaran A, Lowrey K, Todorov EV, Kakade SM (2017) Towards generalization and simplicity in continuous control. Adv Neural Inform Process Syst. 30

  8. Zhang C, Vinyals O, Munos R, Bengio S (2018) A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893

  9. Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2019) Quantifying generalization in reinforcement learning. In: International conference on machine learning, pp 1282–1289. PMLR

  10. Juliani A, Khalifa A, Berges V-P, Harper J, Teng E, Henry H, Crespi A, Togelius J, Lange D (2019) Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378

  11. Raileanu R, Rocktäschel T (2020) Ride: Rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292

  12. Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv preprint arXiv:1810.12894

  13. Kim H, Kim J, Jeong Y, Levine S, Song HO (2018) Emi: Exploration with mutual information. arXiv preprint arXiv:1810.01176

  14. Song Y, Chen Y, Hu Y, Fan C (2020) Exploring unknown states with action balance. arXiv preprint arXiv:2003.04518

  15. Kim Y, Nam W, Kim H, Kim JH, Kim G (2019) Curiosity-bottleneck: Exploration by distilling task-specific novelty. In: International conference on machine learning, pp 3379–3388. PMLR

  16. Ostrovski G, Bellemare MG, Oord A, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, pp 2721–2730. PMLR

  17. Tao RY, François-Lavet V, Pineau J (2020) Novelty search in representational space for sample efficient exploration. Adv Neural Inform Process Syst 33:8114–8126

    Google Scholar 

  18. Kang Y, Zhao E, Li K, Xing J (2021) Exploration via state influence modeling. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 8047–8054

  19. Chevalier-Boisvert M, Willems L, Pal S (2018) Minimalistic gridworld environment for openai gym

  20. Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I, et al. (2018) Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In: International conference on machine learning, pp 1407–1416. PMLR

Download references

Acknowledgements

We thank colleagues for many helpful discussions and suggestions on the research and the paper.

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed to the research and the paper. YK and YD designed the idea of the paper. YD constructed the parts of networks. YK and YD wrote the paper.

Corresponding author

Correspondence to Yan Kong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest/competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, Y., Dou, Y. Critical path-driven exploration on the MiniGrid environment. SOCA 17, 303–308 (2023). https://doi.org/10.1007/s11761-023-00365-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-023-00365-9

Keywords

Navigation