Skip to main content
Log in

Learning to explore by reinforcement over high-level options

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Autonomous 3D environment exploration is a fundamental task for various applications such as navigation and object searching. The goal of exploration is to investigate a new environment and build a map efficiently. In this paper, we propose a new method which grants an agent two intertwined options of behaviors: “look-around” and “frontier navigation.” This is implemented by an option-critic architecture and trained by reinforcement learning algorithms. In each time step, an agent produces an option and a corresponding action according to the policy. We also take advantage of macro-actions by incorporating classic path-planning techniques to increase training efficiency. We demonstrate the effectiveness of the proposed method on two publicly available 3D environment datasets, and the results show our method achieves higher coverage than competing techniques with better efficiency. We also show that our method can be transferred and applied on a rover robot in real-world environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)

  2. Chang, A., Dai, A., Funkhouser, T., et al.: Matterport3d: learning from RGB-D data in indoor environments. arXiv preprint arXiv:1709.06158 (2017)

  3. Chaplot, D.S., Gandhi, D., Gupta, S., et al.: Learning to explore using active neural slam. arXiv preprint arXiv:2004.05155 (2020)

  4. Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. arXiv preprint arXiv:1903.01959 (2019)

  5. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  6. Henriques, J.F., Vedaldi, A.: Mapnet: an allocentric spatial memory for mapping environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8476–8484 (2018)

  7. Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE, pp. 225–234 (2007)

  8. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, PMLR, pp. 1928–1937 (2016)

  9. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  10. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., et al (eds) Advances in Neural Information Processing Systems 32. Curran Associates, Inc., pp. 8024–8035, http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf (2019)

  11. Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, PMLR, pp. 2778–2787 (2017)

  12. Ramakrishnan, S.K., Al-Halah, Z., Grauman, K.: Occupancy anticipation for efficient exploration and navigation. In: European Conference on Computer Vision. Springer, pp. 400–418 (2020)

  13. Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. Int. J. Comput. Vis. 129(5), 1616–1649 (2021)

    Article  Google Scholar 

  14. Savva, M., Kadian, A., Maksymets, O., et al.: Habitat: A platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9339–9347 (2019)

  15. Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  16. Sethian, J.A.: A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. 93(4), 1591–1595 (1996)

    Article  MathSciNet  Google Scholar 

  17. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  Google Scholar 

  18. Tang, H., Houthooft, R., Foote, D., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: 31st Conference on Neural Information Processing Systems (NIPS), pp. 1–18 (2017)

  19. White, C.C.: A survey of solution techniques for the partially observed Markov decision process. Ann. Oper. Res. 32(1), 215–230 (1991)

    Article  MathSciNet  Google Scholar 

  20. Xia, F., Zamir, A.R., He, Z., et al.: Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9068–9079 (2018)

  21. Yamauchi, B.: A frontier-based approach for autonomous exploration. In: Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’. IEEE, pp. 146–151 (1997)

Download references

Acknowledgements

This project was funded by Science for Technological Innovation (https://www.sftichallenge.govt.nz/.) under the spearhead project: Adaptive learning robots to complement the human workforce.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juncheng Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 52391 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., McCane, B. & Mills, S. Learning to explore by reinforcement over high-level options. Machine Vision and Applications 35, 6 (2024). https://doi.org/10.1007/s00138-023-01492-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01492-1

Keywords

Navigation