Skip to main content
Log in

GoToNet: Fast Monocular Scene Exposure and Exploration

  • Regular paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board forward-looking RGB camera for environmental sensing. As opposed to existing methods, our method requires only one look (image) to make a good tactical decision, and therefore works at a non-growing, constant time. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method. These pixels encode the recommended flight instructions in the following way: the Goto pixel defines the direction in which the agent should move by one distance unit, and the Lookat pixel defines the direction in which the camera should be pointing at in the next step. These flying-instruction pixels are optimized to expose the largest amount of currently unexplored areas. Our method presents a novel deep learning-based navigation approach that is able to solve this problem and demonstrate its ability in an even more complicated setup, i.e., when computational power is limited. In addition, we propose a way to generate a navigation-oriented dataset, enabling efficient training of our method using RGB and depth images. Tests conducted in a simulator evaluating both the sparse pixels’ coordinations inferring process, and 2D and 3D test flights aimed to unveil areas and decrease distances to targets achieve promising results. Comparison against a state-of-the-art algorithm shows our method is able to overperform it, that while measuring the new voxels per camera pose, minimum distance to target, percentage of surface voxels seen, and compute time metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv:1812.11941 (2018)

  2. Badki, A., Gallo, O., Kautz, J., et al: Binary TTC: A temporal geofence for autonomous navigation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12,946–12,955. https://openaccess.thecvf.com/content/CVPR2021/html/Badki_Binary_TTC_A_Temporal_Geofence_for_Autonomous_Navigation_CVPR_2021_paper.html (2021)

  3. Badrinarayanan, V., Handa, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv:1505.07293 (2015)

  4. Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: Depth estimation using adaptive bins. arXiv:2011.14141 (2020)

  5. Buyval, A., Afanasyev, I., Magid, E.: Comparative analysis of ROS-based monocular SLAM methods for indoor navigation. In: Verikas, A, Radeva, P, Nikolaev, D.P., et al (eds.) Ninth International Conference on Machine Vision (ICMV 2016), International Society for Optics and Photonics, vol. 10341. SPIE. https://doi.org/10.1117/12.2268809, pp 305–310 (2017)

  6. Celik, K., Chung, S.J., Clausman, M., et al: Monocular vision SLAM for indoor aerial vehicles. In: 2009 IEEE/RSJ international conference on intelligent robots and systems, pp 1566–1573. https://doi.org/10.1109/IROS.2009.5354050 (2009)

  7. Chang, N., Rashidzadeh, R., Ahmadi, M.: Robust indoor positioning using differential wi-fi access points. Consum. Electron. IEEE Trans. 56, 1860–1867 (2010). https://doi.org/10.1109/TCE.2010.5606338https://doi.org/10.1109/TCE.2010.5606338

    Article  Google Scholar 

  8. Chaplot, D.S., Salakhutdinov, R., Gupta, A., et al: Neural topological SLAM for visual navigation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://openaccess.thecvf.com/content_CVPR_2020/html/Chaplot_Neural_Topological_SLAM_for_Visual_Navigation_CVPR_2020_paper.html (2020)

  9. Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. arXiv:1903.01959 (2019a)

  10. Chen, W., Liu, Y.: Active planning of robot navigation for 3d scene exploration. In: 2018 IEEE/ASME international conference on advanced intelligent mechatronics (AIM), pp 516–520. https://doi.org/10.1109/AIM.2018.8452299 (2018)

  11. Chen, X., Läbe, T., Milioto, A., et al: Overlapnet: Loop closing for lidar-based SLAM. arXiv:2105.11344 (2021)

  12. Chen, Y., Chen, Y., Wang, G.: Bundle adjustment revisited. arXiv:1912.03858(2019b)

  13. Cui, L., Ma, C.: Sof-slam: A semantic visual slam for dynamic environments. IEEE Access 7, 166,528–166,539 (2019). https://doi.org/10.1109/ACCESS.2019.2952161

    Article  Google Scholar 

  14. Demmel, N., Schubert, D., Sommer, C., et al: Square root marginalization for sliding-window bundle adjustment. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 13,260–13,268 (2021)

  15. Devo, A., Mezzetti, G., Costante, G., et al: Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Trans. Robot. 36(5), 1546–1561 (2020). https://doi.org/10.1109/TRO.2020.2994002

    Article  Google Scholar 

  16. Dhamo, H., Tateno, K., Laina, I., et al: Peeking behind objects: Layered depth prediction from a single image. Pattern Recogn. Lett. 125, 333–340 (2019). https://doi.org/10.1016/j.patrec.2019.05.007

    Article  Google Scholar 

  17. Dourado, A., De, Campos TE, Kim, H., et al: Edgenet: Semantic scene completion from a single RGB-D image. In: 2020 25th international conference on pattern recognition (ICPR), pp 503–510. https://doi.org/10.1109/ICPR48806.2021.9413252 (2021)

  18. Epic Games: Unreal engine. https://www.unrealengine.com (2019)

  19. Farahani, A., Voghoei, S., Rasheed, K., et al: A brief review of domain adaptation. In: Stahlbock, R, Weiss, G M, Abou-Nasr, M, e al (eds.) Advances in Data Science and Information Engineering, pp 877–894. Springer International Publishing, Cham (2021)

  20. Geva, A., Rotstein, H., Rivlin, E.: Sensory routines for indoor autonomous quad-copter. PhD thesis, Computer Science Department, Technion (2019)

  21. Haverinen, J., Kemppainen, A.: Global indoor self-localization based on the ambient magnetic field. Robotics and Autonomous Systems 57 (10), 1028–1035 (2009). https://doi.org/10.1016/j.robot.2009.07.018. https://www.sciencedirect.com/science/article/pii/S0921889009001092, 5th International Conference on Computational Intelligence, Robotics and Autonomous Systems (5th CIRAS)

    Article  Google Scholar 

  22. Hazirbas, C., Ma, L., Domokos, C., et al Lai, S.H., Lepetit, V, Nishino, K, et al (eds.): Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture, vol. 2016. Springer International Publishing, Cham (2017)

  23. He, J., Zhang, S., Yang, M., et al: Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_CVPR_2019/html/He_Bi-Directional_Cascade_Network_for_Perceptual_Edge_Detection_CVPR_2019_paper.html (2019)

  24. He, K., Zhang, X., Ren, S., et al: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html (2016)

  25. Hepp, B., Dey, D., Sinha, S.N., et al: Learn-to-score: Efficient 3D scene exploration by predicting view utility. In: Proceedings of the European Conference on Computer Vision (ECCV). https://openaccess.thecvf.com/content_ECCV_2018/html/Benjamin_Hepp_Learn-to-Score_Efficient_3D_ECCV_2018_paper.html (2018)

  26. Höfer, S., Bekris, K., Handa, A., et al: Perspectives on sim2real transfer for robotics: A summary of the r:ss 2020 workshop. arXiv:2012.03806 (2020)

  27. Huang, G., Liu, Z., van der Maaten, L., et al: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf.com/content_cvpr_2017/html/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.html (2017)

  28. Indelman, V., Roberts, R., Beall, C., et al: Incremental light bundle adjustment. In: Proceedings of the British Machine Vision Conference. https://doi.org/10.5244/C.26.134, pp 134.1–134.11. BMVA Press (2012)

  29. Kühner, T., Kümmerle, J.: Large-scale volumetric scene reconstruction using lidar. In: 2020 IEEE international conference on robotics and automation (ICRA), pp. 6261–6267. https://doi.org/10.1109/ICRA40945.2020.9197388 (2020)

  30. Kouw, W.M., Loog, M.: A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 766–785 (2021). https://doi.org/10.1109/TPAMI.2019.2945942

    Article  Google Scholar 

  31. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F, Burges, CJC, Bottou, L, et al (eds.) Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html, vol. 25. Curran Associates Inc (2012)

  32. Laina, I., Rupprecht, C., Belagiannis, V., et al: Deeper depth prediction with fully convolutional residual networks. In: 2016 fourth international conference on 3D vision (3DV), pp. 239–248. https://doi.org/10.1109/3DV.2016.32 (2016)

  33. Li, R., Xian, K., Shen, C., et al: Deep attention-based classification network for robust depth prediction. In: Jawahar, C, Li, H, Mori, G (eds.) Computer Vision – ACCV, vol. 2018, pp 663–678. Springer International Publishing, Cham (2019)

  34. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Long_Fully_Convolutional_Networks_2015_CVPR_paper.html (2015)

  35. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.1999.790410, vol. 2, pp 1150–1157 (1999)

  36. Ma, L., Liu, Y., Chen, J.: Using RGB image as visual input for mapless robot navigation. arXiv:1903.09927 (2019)

  37. Mansouri, S.S., Kanellakis, C., Kominiak, D., et al: Deploying mavs for autonomous navigation in dark underground mine environments. Robotics and Autonomous Systems 126, 103,472 (2020). https://doi.org/10.1016/j.robot.2020.103472. https://www.sciencedirect.com/science/article/pii/S0921889019306256

    Article  Google Scholar 

  38. Mazan, F., Kovarova, A.: A study of devising neural network based indoor localization using beacons: First results. Computing & Information Systems 19(1), 15–20 (2015)

    Google Scholar 

  39. Michel, R.: Information management: Wearables come in for a refit. Modern Materials Handling. https://www.mmh.com/article/information_management_wearables_come_in_for_a_refit (2017)

  40. Nguyen, A., Nguyen, N., Tran, K., et al: Autonomous navigation in complex environments with deep multimodal fusion network. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5824–5830. https://doi.org/10.1109/IROS45743.2020.9341494 (2020)

  41. Padhy, R.P., Ahmad, S., Verma, S., et al: Localization of unmanned aerial vehicles in corridor environments using deep learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 9423–9428. IEEE (2021)

  42. Poma, X.S., Riba, E., Sappa, A.: Dense extreme inception network: Towards a robust cnn model for edge detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). https://openaccess.thecvf.com/content_WACV_2020/html/Poma_Dense_Extreme_Inception_Network_Towards_a_Robust_CNN_Model_for_WACV_2020_paper.html (2020)

  43. Reza, A.W., Geok, T.: Investigation of indoor location sensing via rfid reader network utilizing grid covering algorithm. Wirel. Pers. Commun. 49, 67–80 (2009). https://doi.org/10.1007/s11277-008-9556-4

    Article  Google Scholar 

  44. Ronneberger, O., Fischer, P., Brox, T., et al Navab, N, Hornegger, J, Wells, W.M. (eds.): U-net: Convolutional networks for biomedical image segmentation, vol. 2015. Springer International Publishing, Cham (2015)

  45. Saleem, A., Jabri, K.A., Maashri, A.A., et al: Obstacle-avoidance algorithm using deep learning based on rgbd images and robot orientation. In: 2020 7th international conference on electrical and electronics engineering (ICEEE), pp. 268–272. https://doi.org/10.1109/ICEEE49618.2020.9102526 (2020)

  46. Sax, A., Emi, B., Zamir, A.R., et al: Mid-level visual representations improve generalization and sample efficiency for learning visuomotor policies. arXiv:1812.11971 (2018)

  47. Shah, S., Dey, D., Lovett, C., et al Hutter, M, Siegwart, R (eds.): Airsim: High-fidelity visual and physical simulation for autonomous vehicles. Springer International Publishing, Cham (2018)

  48. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  49. Sobel, I., Feldman, G.: A 3 × 3 isotropic gradient operator for image processing. presented at the Stanford Artificial Intelligence Project (SAIL) (1968)

  50. Szegedy, C., Ioffe, S., Vanhoucke, V., et al: Inception-v4, inception-resnet and the impact of residual connections on learning. https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806 (2017)

  51. Tiwari, L., Ji, P., Tran, Q.H., et al: Pseudo rgb-d for self-improving monocular slam and depth prediction. In: Vedaldi, A, Bischof, H, Brox, T, et al (eds.) Computer Vision – ECCV, vol. 2020, pp 437–455. Springer International Publishing, Cham (2020)

  52. Triggs, B., McLauchlan, P.F., Hartley, R.I., et al: Bundle adjustment — a modern synthesis. In: Triggs, B, Zisserman, A, Szeliski, R (eds.) Vision Algorithms: Theory and Practice, pp 298–372. Berlin, Springer (2000)

  53. Wahid, A., Toshev, A., Fiser, M., et al: Long range neural navigation policies for the real world. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 82–89. https://doi.org/10.1109/IROS40897.2019.8968004 (2019)

  54. Wang, D., Fan, T., Han, T., et al: A two-stage reinforcement learning approach for multi-uav collision avoidance under imperfect sensing. IEEE Robot. Autom. Lett. 5(2), 3098–3105 (2020a). https://doi.org/10.1109/LRA.2020.2974648

    Article  Google Scholar 

  55. Wang, L., Ye, H., Wang, Q., et al: Learning-based 3d occupancy prediction for autonomous navigation in occluded environments. arXiv:2011.03981 (2020b)

  56. Wang, Y., Del Bue, A.: Where to explore next? ExHistCNN for history-aware autonomous 3D exploration. In: Vedaldi, A., Bischof, H., Brox, T., et al (eds.) Computer Vision – ECCV 2020. https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/6579_ECCV_2020_paper.php, pp 125–140. Springer International Publishing, Cham (2020)

  57. Wenzel, P., Schön, T, Leal-Taixé, L., et al: Vision-based mobile robotics obstacle avoidance with deep reinforcement learning. arXiv:2103.04727 (2021)

  58. Yasin, J.N., Mohamed, S.A.S., Haghbayan, M.H., et al: Unmanned aerial vehicles (uavs): Collision avoidance systems and approaches. IEEE Access 8, 105:139–105:155 (2020). https://doi.org/10.1109/ACCESS.2020.3000064

    Article  Google Scholar 

  59. Yasin, J.N., Mohamed, S.A.S., Haghbayan, M.H., et al: Energy-efficient formation morphing for collision avoidance in a swarm of drones. IEEE Access 8, 170:681–170:695 (2020b). https://doi.org/10.1109/ACCESS.2020.3024953

    Article  Google Scholar 

  60. Zhang, J., Yu, X., Li, A., et al: Weakly-supervised salient object detection via scribble annotations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR (2020)

  61. Zhou, X., Zhu, J., Zhou, H., et al: Ego-swarm: A fully autonomous and decentralized quadrotor swarm system in cluttered environments. In: 2021 IEEE international conference on robotics and automation (ICRA), pp. 4101–4107. https://doi.org/10.1109/ICRA48506.2021.9561902 (2021)

  62. Zhu, Y., Mottaghi, R., Kolve, E., et al: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA). https://doi.org/10.1109/ICRA.2017.7989381, pp 3357–3364. IEEE (2017)

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Tom Avrech. The first draft of the manuscript was written by Tom Avrech, edited by Evgenii Zheltonozhskii, Chaim Baskin and Ehud Rivlin. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tom Avrech.

Ethics declarations

Conflict of Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Code or data availability

The code for reproducing our experiments will be made public upon acceptance.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Avrech, T., Zheltonozhskii, E., Baskin, C. et al. GoToNet: Fast Monocular Scene Exposure and Exploration. J Intell Robot Syst 105, 65 (2022). https://doi.org/10.1007/s10846-022-01646-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-022-01646-9

Keywords

Navigation