Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation

Mumuni, Fuseini; Mumuni, Alhassan

doi:10.1007/s41315-022-00226-2

Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation

Regular Paper
Published: 02 March 2022

Volume 6, pages 191–206, (2022)
Cite this article

International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript

482 Accesses
6 Citations
20 Altmetric
Explore all metrics

Abstract

Monocular depth estimation (MDE) provides information (from a single image) about overall scene layout, and is useful in robotics for autonomous navigation and vision-aided guidance. Advancements in deep learning, particularly self-supervised convolutional neural networks (CNNs), have led to the development of MDE models capable of providing highly accurate per-pixel depth maps. However, these models are typically tuned for specific datasets, leading to sharp performance degradation in real-world scenarios, particularly in robot vision tasks—where the natural environments are too varied and complex to be sufficiently described by standard datasets. Motivated by the approach of biological vision, whose immense success relies on optimal combination of multiple depth cues and knowledge about the underlying environments, we exploit structure from motion (SfM) through optical flow as an additional depth cue and prior knowledge about depth distribution in the environment to improve monocular depth prediction. Meanwhile, there is a general incompatibility between the outputs of these models—whereas SfM measures absolute distances, MDE is scale ambiguous, returning only depth ratios. Consequently, we show how it is possible to promote MDE cue from ordinal scale to the same metric scale as SfM, thus, enabling their optimal integration in a Bayesian optimal manner. Additionally, we generalize the relationship between camera tilt angles and resulting MDE distortions, and show how this can be used to further improve depth perception robustness and accuracy (up to 6.2%) for a mobile robot whose heading is subject to arbitrary angular inclinations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rotation invariance and equivariance in 3D deep learning: a survey

Article Open access 07 June 2024

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Deep learning-based 3D reconstruction: a survey

Article 28 January 2023

References

Aleotti, F., Zaccaroni, G., Bartolomei, L., Poggi, M., Tosi, F., Mattoccia, S.: Real-time single image depth perception in the wild with handheld devices. Sensors 21(1), 15 (2021)
Article Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941 (2019)
Andhare, P., Rawat, S.: Pick and place industrial robot controller with computer vision. In: 2016 International Conference on Computing Communication Control and automation (ICCUBEA) (pp. 1–4). IEEE (2016)
Andraghetti, L., Myriokefalitakis, P., Dovesi, P. L., Luque, B., Poggi, M., Pieropan, A., Mattoccia, S.: Enhancing self-supervised monocular depth estimation with traditional visual odometry. In 2019 International Conference on 3D Vision (3DV) (pp. 424–433). IEEE (2019)
Aytekin, M., Rucci, M.: Motion parallax from microscopic head movements during visual fixation. Vision. Res. 70, 7–17 (2012)
Article Google Scholar
Bernstein, A.V., Burnaev, E.V., Kachan, O.N.: Reinforcement learning for computer vision and robot navigation. In: International Conference on Machine Learning and Data Mining in Pattern Recognition, pp. 258–272. Springer, Cham (2018)
Chapter Google Scholar
Bian, J., Lin, W.Y., Matsushita, Y., Yeung, S.K., Nguyen, T.D., Cheng, M.M.: Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4181–4190) (2017).
Cadena, C., Dick, A.R., Reid, I.D.: Multi-modal auto-encoders as joint estimators for robotics scene understanding. Robot. Sci. Syst. 5, 1 (2016)
Google Scholar
Chen, X., McNamara, T.P., Kelly, J.W., Wolbers, T.: Cue combination in human spatial navigation. Cogn. Psychol. 95, 105–144 (2017)
Article Google Scholar
Chen, T., An, S., Zhang, Y., Ma, C., Wang, H., Guo, X., Zheng, W.: Improving monocular depth estimation by leveraging structural awareness and complementary datasets. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16. Springer International Publishing, pp. 90–108 (2020)
Cheng, L., Wu, G.: Obstacles detection and depth estimation from monocular vision for inspection robot of high voltage transmission line. Clust. Comput. 22(2), 2611–2627 (2019)
Article Google Scholar
Cheng, K., Shettleworth, S.J., Huttenlocher, J., Rieser, J.J.: Bayesian integration of spatial information. Psychol. Bull. 133(4), 625 (2007)
Article Google Scholar
Cheng, Y., Wang, G. Y.: Mobile robot navigation based on lidar. In: 2018 Chinese Control and Decision Conference (CCDC) (pp. 1243–1246). IEEE (2018)
Cui, Y., Chen, R., Chu, W., Chen, L., Tian, D., Li, Y., Cao, D.: Deep learning for image and point cloud fusion in autonomous driving: a review. IEEE Trans. Intell. Transport. Syst. 23(2), 722–739 (2022). https://doi.org/10.1109/TITS.2020.3023541
Cutting, J.E., Vishton, P.M.: Perceiving layout and knowing distances: the integration, relative potency, and contextual use of different information about depth. In: Perception of space and motion, pp. 69–117. Academic Press, New York (1995)
Chapter Google Scholar
de Queiroz Mendes, R., Ribeiro, E.G., dos Santos Rosa, N., Grassi, V., Jr.: On deep learning techniques to boost monocular depth estimation for autonomous navigation. Robot. Autonom. Syst. 136, 103701 (2021)
Article Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. arXiv preprint arXiv:1406.2283 (2014)
Farooq Bhat, S., Alhashim, I., Wonka, P.: AdaBins: Depth Estimation using Adaptive Bins. arXiv e-prints, arXiv-2011 (2020)
Ferreira, J., Lobo, J., Bessiere, P., Castelo-Branco, M., Dias, J.: A Bayesian framework for active artificial perception. IEEE Trans. Cybern. 43(2), 699–711 (2013)
Article Google Scholar
Fritsche, P., Zeise, B., Hemme, P., Wagner, B:. Fusion of radar, LiDAR and thermal information for hazard detection in low visibility environments. In: 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR) (pp. 96–101). IEEE (2017)
Godard, C., Mac Aodha, O., Brostow, G. J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 270–279) (2017)
Göhring, D., Wang, M., Schnürmacher, M., Ganjineh, T.: Radar/lidar sensor fusion for car-following on highways. In: The 5th International Conference on Automation, Robotics and Applications (pp. 407–412). IEEE (2011)
Hambarde, P., Murala, S.: S2dnet: depth estimation from single image and sparse samples. IEEE Trans. Comput. Imaging 6, 806–817 (2020)
Article Google Scholar
Huber, J., Graefe, V.: Motion stereo for mobile robots. IEEE Trans. Ind. Electron. 41(4), 378–383 (1994)
Article Google Scholar
Jonschkowski, R., Stone, A., Barron, J. T., Gordon, A., Konolige, K., Angelova, A.: What matters in unsupervised optical flow. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16 (pp. 557–572). Springer International Publishing (2020)
Klingner, M., Termöhlen, J.A., Mikolajczyk, J., Fingscheidt, T.: Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European Conference on Computer Vision, pp. 582–600. Springer, Cham (2020)
Google Scholar
Knill, D.C., Pouget, A.: The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27(12), 712–719 (2004)
Article Google Scholar
Knill, D.C., Saunders, J.A. (2007). Bayesian models of sensory cue integration. Bayesian brain: Probabilistic approaches to neural coding, 189–206.
Krauzlis, R.J., Goffart, L., Hafed, Z.M.: Neuronal control of fixation and fixational eye movements. Philos. Trans. R. Soc. B Biol. Sci. 372(1718), 20160205 (2017)
Article Google Scholar
Landy, M.S., Maloney, L.T., Johnston, E.B., Young, M.: Measurement and modeling of depth cue combination: in defense of weak fusion. Vision. Res. 35(3), 389–412 (1995)
Article Google Scholar
Lee, J. H., Han, M. K., Ko, D. W., Suh, I. H.: From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)
Liu, L., Zhang, J., He, R., Liu, Y., Wang, Y., Tai, Y. et al.: Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6489–6498) (2020)
Ma, F., Karaman, S.: Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4796–4803). IEEE (2018).
Ma, J., Zhao, J., Jiang, J., Zhou, H., Guo, X.: Locality preserving matching. Int. J. Comput. Vision 127(5), 512–531 (2019)
Article MathSciNet Google Scholar
Mumuni, A., Mumuni, F.: CNN architectures for geometric transformation-invariant feature representation in computer vision: a review. SN Comput. Sci. 2, 340 (2021a)
Article Google Scholar
Mumuni, F., Mumuni, A.: Adaptive Kalman filter for MEMS IMU data fusion using enhanced covariance scaling. Control Theory Technol 19, 1–10 (2021b)
Article MathSciNet Google Scholar
Peluso, V., Cipolletta, A., Calimera, A., Poggi, M., Tosi, F., Mattoccia, S.: Enabling energy-efficient unsupervised monocular depth estimation on armv7-based platforms. In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1703–1708). IEEE (2019)
Saeedan, F., Roth, S.: Boosting Monocular Depth with Panoptic Segmentation Maps. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3853–3862) (2021)
Song, M., Lim, S., Kim, W.: Monocular depth estimation using Laplacian pyramid-based depth residuals. IEEE Trans. Circuits Syst. Video Technol. 31, 4381–4393 (2021)
Article Google Scholar
Tu, X., Xu, C., Liu, S., Li, R., Xie, G., Huang, J., Yang, L.T.: Efficient monocular depth estimation for edge devices in internet of things. IEEE Trans. Industr. Inf. 17(4), 2821–2832 (2020)
Article Google Scholar
Turan, M., Shabbir, J., Araujo, H., Konukoglu, E., Sitti, M.: A deep learning based fusion of RGB camera information and magnetic localization information for endoscopic capsule robots. Int. J. Intell. Robot. Appl 1(4), 442–450 (2017)
Article Google Scholar
Vásquez, B.P.E.A., Matía, F.: A tour-guide robot: moving towards interaction with humans. Eng. Appl. Artif. Intell. 88, 103356 (2020)
Article Google Scholar
Vuong, Q.C., Domini, F., Caudek, C.: Disparity and shading cues cooperate for surface interpolation. Perception 35(2), 145–155 (2006)
Article Google Scholar
Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: SDC-depth: semantic divide-and-conquer network for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 541–550) (2020)
Wofk, D., Ma, F., Yang, T. J., Karaman, S., Sze, V. Fastdepth: fast monocular depth estimation on embedded systems. In: 2019 International Conference on Robotics and Automation (ICRA) (pp. 6101–6108). IEEE (2019)
Yang, H., Chen, L., Ma, Z., Chen, M., Zhong, Y., Deng, F., Li, M.: Computer vision-based high-quality tea automatic plucking robot using Delta parallel manipulator. Comput. Electron. Agric. 181, 105946 (2021)
Article Google Scholar
Yokoyama, K., Morioka, K.: Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera. In: 2020 IEEE/SICE International Symposium on System Integration (SII) (pp. 525–530). IEEE (2020)
Yoneyama, R., Duran, A.J., Del Pobil, A.P.: Integrating sensor models in deep learning boosts performance: application to monocular depth estimation in warehouse automation. Sensors 21(4), 1437 (2021)
Article Google Scholar
Yucel, M. K., Dimaridou, V., Drosou, A., Saa-Garriga, A.: Real-time monocular depth estimation with sparse supervision on mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2428–2437) (2021)
Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 340–349) (2018)
Zhao, Y., Kong, S., Fowlkes, C.: When Perspective Comes for Free: Improving Depth Prediction with Camera Pose Encoding. arXiv preprint arXiv:2007.03887 (2020)
Zhou, T., Brown, M., Snavely, N., Lowe, D. G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1851–1858) (2017)
Zingg, S., Scaramuzza, D., Weiss, S., Siegwart, R.: MAV navigation through indoor corridors using optical flow. In: 2010 IEEE International Conference on Robotics and Automation (pp. 3361–3368). IEEE (2010)

Download references

Funding

We have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or personal interest in any business or product.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, University of Mines and Technology (UMaT), Tarkwa, Ghana
Fuseini Mumuni
Department of Electrical and Electronic Engineering, Cape Coast Technical University, Cape Coast, Ghana
Alhassan Mumuni

Authors

Fuseini Mumuni
View author publications
You can also search for this author in PubMed Google Scholar
Alhassan Mumuni
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The contributions of the authors to this manuscript are as follows: FM: conceptualized the project framework. AM: helped concretized and refined the initial ideas. Both authors jointly build the robotic platform and fitted hardware components. The algorithms were developed and programmed jointly. Both authors carried out the experiments and wrote approximately equal portions of the text. The graphics are also the works of both authors.

Corresponding author

Correspondence to Fuseini Mumuni.

Ethics declarations

Conflict of interest

We declare that there is no conflict of interest associated with this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mumuni, F., Mumuni, A. Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation. Int J Intell Robot Appl 6, 191–206 (2022). https://doi.org/10.1007/s41315-022-00226-2

Download citation

Received: 06 September 2021
Accepted: 21 January 2022
Published: 02 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s41315-022-00226-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation

Abstract

Access this article

Similar content being viewed by others

Rotation invariance and equivariance in 3D deep learning: a survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Deep learning-based 3D reconstruction: a survey

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian cue integration of structure from motion and CNN-based monocular depth estimation for autonomous robot navigation

Abstract

Access this article

Similar content being viewed by others

Rotation invariance and equivariance in 3D deep learning: a survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Deep learning-based 3D reconstruction: a survey

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation