Critical path-driven exploration on the MiniGrid environment

Kong, Yan; Dou, Yu

doi:10.1007/s11761-023-00365-9

Critical path-driven exploration on the MiniGrid environment

Special Issue Paper
Published: 03 June 2023

Volume 17, pages 303–308, (2023)
Cite this article

Service Oriented Computing and Applications Aims and scope Submit manuscript

128 Accesses
Explore all metrics

Abstract

It is a critical challenge for model-free reinforcement learning to explore in sparse reward environments with advantage. Many current state-of-the-art approaches take the approach of designing intrinsic rewards to encourage exploration. However, when there are multiple new domains to explore in the environment, many approaches usually focus on one domain and do not fully explore other domains. In this paper, we propose a new intrinsic reward approach—critical path-driven exploration (CPDE), which promotes exploration by encouraging agent to take more favorable actions by paying extra attention to some key points during the process of reaching a goal point. We evaluated it by performing it in several challenging procedural generation tasks in MiniGrid. Our experiments show that this approach performs better than existing exploration methods in the MiniGrid environment. We analyzed the intrinsic rewards received by our agent, and our intrinsic reward approach caused the agent to take more favorable actions compared to previous approaches, thus increasing the speed of exploration in a sparse reward environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

Article 07 December 2023

References

Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Adv Neural Inform Process Syst. 29
Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2019) Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995
Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning, pp 2778–2787. PMLR
Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros AA (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937. PMLR
Bai C, Liu P, Liu K, Wang L, Zhao Y, Han L, Wang Z (2021) Variational dynamic for self-supervised exploration in deep reinforcement learning. IEEE Transactions on neural networks and learning systems
Rajeswaran A, Lowrey K, Todorov EV, Kakade SM (2017) Towards generalization and simplicity in continuous control. Adv Neural Inform Process Syst. 30
Zhang C, Vinyals O, Munos R, Bengio S (2018) A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893
Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2019) Quantifying generalization in reinforcement learning. In: International conference on machine learning, pp 1282–1289. PMLR
Juliani A, Khalifa A, Berges V-P, Harper J, Teng E, Henry H, Crespi A, Togelius J, Lange D (2019) Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378
Raileanu R, Rocktäschel T (2020) Ride: Rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292
Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv preprint arXiv:1810.12894
Kim H, Kim J, Jeong Y, Levine S, Song HO (2018) Emi: Exploration with mutual information. arXiv preprint arXiv:1810.01176
Song Y, Chen Y, Hu Y, Fan C (2020) Exploring unknown states with action balance. arXiv preprint arXiv:2003.04518
Kim Y, Nam W, Kim H, Kim JH, Kim G (2019) Curiosity-bottleneck: Exploration by distilling task-specific novelty. In: International conference on machine learning, pp 3379–3388. PMLR
Ostrovski G, Bellemare MG, Oord A, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, pp 2721–2730. PMLR
Tao RY, François-Lavet V, Pineau J (2020) Novelty search in representational space for sample efficient exploration. Adv Neural Inform Process Syst 33:8114–8126
Google Scholar
Kang Y, Zhao E, Li K, Xing J (2021) Exploration via state influence modeling. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 8047–8054
Chevalier-Boisvert M, Willems L, Pal S (2018) Minimalistic gridworld environment for openai gym
Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I, et al. (2018) Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In: International conference on machine learning, pp 1407–1416. PMLR

Download references

Acknowledgements

We thank colleagues for many helpful discussions and suggestions on the research and the paper.

Funding

None.

Author information

Authors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, 210044, China
Yan Kong & Yu Dou

Authors

Yan Kong
View author publications
You can also search for this author in PubMed Google Scholar
Yu Dou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have contributed to the research and the paper. YK and YD designed the idea of the paper. YD constructed the parts of networks. YK and YD wrote the paper.

Corresponding author

Correspondence to Yan Kong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest/competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kong, Y., Dou, Y. Critical path-driven exploration on the MiniGrid environment. SOCA 17, 303–308 (2023). https://doi.org/10.1007/s11761-023-00365-9

Download citation

Received: 28 February 2023
Revised: 22 April 2023
Accepted: 17 May 2023
Published: 03 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11761-023-00365-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Critical path-driven exploration on the MiniGrid environment

Abstract

Access this article

Similar content being viewed by others

Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Critical path-driven exploration on the MiniGrid environment

Abstract

Access this article

Similar content being viewed by others

Continuous Exploration via Multiple Perspectives in Sparse Reward Environment

Exploration-Intensive Distractors: Two Environment Proposals and a Benchmarking

A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation