Skip to main content
Log in

FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Monocular depth estimation (MDE) is critical in enabling intelligent autonomous systems and has received considerable attention in recent years. Achieving both low latency and high accuracy in MDE is desirable but challenging to optimize, especially on edge devices. In this paper, we present a novel approach to balancing speed and accuracy in MDE on edge devices. We introduce FasterMDE, an efficient and fast encoder-decoder network architecture that leverages a multiobjective neural architecture search method to find the optimal encoder structure for the target edge. Moreover, we incorporate a neural window fully connected CRF module into the network as the decoder, enhancing fine-grained depth prediction based on coarse depth and image features. To address the issue of bad “local minimums” in the multiobjective neural architecture search, we propose a new approach for automatically learning the weights of subobjective loss functions based on uncertainty. We also accelerate the FasterMDE model using TensorRT and implement it on a target edge device. The experimental results demonstrate that FasterMDE achieves a better balance of speed and accuracy on the KITTI and NYUv2 datasets compared to previous methods. We validate the effectiveness of the proposed method through an ablation study and verify the real-time monocular depth estimation performance of FasterMDE in realistic scenarios. On the KITTI dataset, the FasterMDE model achieves a high frame rate of 555.55 FPS with 9.1% Abs Rel on a single NVIDIA Titan RTX GPU and 14.46 FPS on the NVIDIA Jetson Xavier NX.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data Availability

The datasets generated during the current study are available in the [github] repository [https://github.com/douziwenhit/FasterMDE.git].

References

  1. Liu J, Li Q, Cao R, Tang W, Qiu G (2020) Mininet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS J Photogramm Remote Sens 166:255–267477

    Article  Google Scholar 

  2. Zhang Z, Wang Y, Huang Z, Luo G, Yu G, Fu B (2021) A simple baseline for fast and accurate depth estimation on mobile devices. In: Computer vision and pattern recognition

  3. Muhammad K, Ullah A, Lloret J, Del Ser J, de Albuquerque VHC (2020) Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans Intell Transp Syst 22(7):4316–4336

    Article  Google Scholar 

  4. Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci, E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3917–3925

  5. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2002–2011

  6. Udaya Mohanan K, Cho S, Park B-G (2022) Optimization of the structural complexity of artificial neural network for hardware-driven neuromorphic computing application. Applied Intelligence 1–19

  7. Bhat SF, Alhashim I, Wonka P (2021) Adabins: Depth estimation using adaptive bins. In: Computer vision and pattern recognition

  8. Ignatov A, Malivenko G, Plowman D, Shukla S, Timofte R, Zhang Z, Wang Y, Huang Z, Luo G, Yu G (2021) Fast and accurate single-image depth estimation on mobile devices, Mobile AI 2021 challenge: Report

  9. Dong X, Garratt MA, Anavatti SG, Abbass HA (2021) Towards real-time monocular depth estimation for robotics: A survey

  10. Wofk D, Ma F, Yang TJ, Karaman S, Sze V (2019) FastDepth: Fast monocular depth estimation on embedded systems. IEEE

  11. Yuan W, Gu X, Dai Z, Zhu S, Tan P (2022) New crfs: Neural window fully-connected crfs for monocular depth estimation

  12. Xu D, Ricci E, Ouyang W, Wang X, Sebe N (2018) Monocular depth estimation using multi-scale continuous crfs as sequential deep networks. IEEE

  13. Dan X, Wei W, Hao T, Hong L, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. IEEE

  14. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE

  16. Huang G, Liu Z, Laurens V, Weinberger KQ (2016) Densely connected convolutional networks. IEEE Computer Society

  17. Zhang X, Zhou X, Lin M, Sun J (2017) Shufflenet: An extremely efficient convolutional neural network for mobile devices

  18. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications

  19. Liu C, Chen LC, Schroff F, Adam H, Hua W, Yuille AL, Li FF (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  20. Bartoccioni F, Zablocki É, Pérez P, Cord M, Alahari K (2023) Lidartouch: Monocular metric depth estimation with a few-beam lidar. Comput Vis Image Understand 227:103601

    Article  Google Scholar 

  21. Hwang J-J, Kretzschmar H, Manela J, Rafferty S, Armstrong-Crews N, Chen T, Anguelov D (2022) Cramnet: Camera-radar fusion with ray-constrained cross-attention for robust 3d object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp 388–405 . Springer

  22. Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Towards real-time monocular depth estimation for robotics: A survey. IEEE Trans Intell Transp Syst 23(10):16940–16961

    Article  Google Scholar 

  23. Liu S, Tu X, Xu C, Li R (2022) Deep neural networks with attention mechanism for monocular depth estimation on embedded devices. Future generations computer systems: FGCS (131-), 131

  24. Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Mobilexnet: An efficient convolutional neural network for monocular depth estimation. IEEE Trans Intell Transp Syst 23(11):20134–20147

  25. Wang L, Famouri M, Wong A (2020) Depthnet nano: A highly compact self-normalizing neural network for monocular depth estimation

  26. Liu H, Simonyan K, Yang Y (2018) Darts: Differentiable architecture search. arXiv:1806.09055

  27. Liu C, Chen L-C, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 82–92

  28. Wu J, Kuang H, Lu Q, Lin Z, Shi Q, Liu X, Zhu X (2022) M-fasterseg: An efficient semantic segmentation network based on neural architecture search. Eng Appl Artif Intell 113:104962

    Article  Google Scholar 

  29. Chen W, Gong X, Liu X, Zhang Q, Li Y, Wang Z (2019) Fasterseg: Searching for faster real-time semantic segmentation. arXiv:1912.10917

  30. Dai X, Chen D, Liu M, Chen Y, Yuan L (2020) Da-nas: Data adapted pruning for efficient neural architecture search

  31. Lin P, Sun P, Cheng G, Xie S, Shi J (2020) Graph-Guided Architecture Search for Real-Time Semantic Segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  32. Ding M, Huo Y, Lu H, Yang L, Wang Z, Lu Z, Wang J, Luo P (2021) Learning versatile neural architectures by propagating network codes

  33. Wang J, Sun K, Cheng T, Jiang B, Xiao B (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell PP(99):1–1

    Google Scholar 

  34. Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: Multi-scale local planar guidance for monocular depth estimation

  35. Chai Y (2019) Patchwork: A patch-wise attention network for efficient object detection and segmentation in video streams. IEEE

  36. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press

    Google Scholar 

  37. Chen W, Gong X, Liu X, Zhang Q, Li Y, Wang Z (2020) Fasterseg: Searching for faster real-time semantic segmentation. In: International conference on learning representations

  38. Cheng A-C, Lin CH, Juan D-C, Wei W, Sun M (2020) Instanas: Instance-aware neural architecture search. Proceedings of the AAAI Conference on artificial intelligence 34:3577–3584

    Article  Google Scholar 

  39. Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed/accuracy trade-off in neural architecture search. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 9145–9153

  40. Gong X, Chang S, Jiang Y, Wang Z (2019) Autogan: Neural architecture search for generative adversarial networks. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 3224–3234

  41. Eigen D, Fergus R (2014) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. IEEE

  42. Qi X, Liao R, Liu Z, Urtasun R, Jia J (2018) Geonet: Geometric neural network for joint depth and surface normal estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  43. Pilzer A, Xu D, Puscas MM, Ricci E, Sebe N (2018) Unsupervised adversarial depth estimation using cycled generative networks. IEEE

  44. Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. IEEE

  45. Poggi M, Aleotti F, Tosi F, Mattoccia S (2018) Towards real-time unsupervised monocular depth estimation on cpu. IEEE

  46. Tosi F, Aleotti F, Poggi M, Mattoccia S (2019) Learning monocular depth estimation infusing traditional stereo knowledge

  47. Atapour-Abarghouei A, Breckon TP (2018) Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2800–2810

  48. Patil V, Gansbeke WV, Dai D, Gool LV (2020) Don’t forget the past: Recurrent depth estimation from monocular video. IEEE Robotics and Automation Letters

  49. Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning

  50. Chen X, Zhang R, Jiang J, Wang Y, Li G, Li TH (2023) Self-supervised monocular depth estimation: Solving the edge-fattening problem. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 5776–5786

  51. Dao T-T, Pham Q-V, Hwang W-J (2022) Fastmde: A fast cnn architecture for monocular depth estimation at high resolution. IEEE Access 10:16111–16122

    Article  Google Scholar 

  52. Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 5684–5693

  53. Yang G, Tang H, Ding M, Sebe N, Ricci E (2021) Transformer-based attention networks for continuous pixel-wise prediction. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 16269–16279

  54. Ranftl R, Bochkovskiy A, Koltun V (2021) Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 12179–12188

  55. Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326

  56. Song M, Lim S, Kim W (2021) Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Trans Circ Syst Vid Technol 31(11):4381–4393

    Article  Google Scholar 

  57. Shu C, Chen Z, Chen L, Ma K, Wang M, Ren H (2022) Sidert: A real-time pure transformer architecture for single image depth estimation. arXiv:2204.13892

  58. Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed/accuracy trade-off in neural architecture search. IEEE

  59. Lee JH, Heo M, Kim KR, Kim CS (2018) Single-image depth estimation based on fourier domain analysis. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition

  60. Papa L, Alati E, Russo P, Amerini I (2022) Speed: Separable pyramidal pooling encoder-decoder for real-time monocular depth estimation on low-resource settings. IEEE Access 10:44881–44890

    Article  Google Scholar 

  61. Ibrahem H, Salem A, Kang H-S (2022) Sd-depth: Light-weight monocular depth estimation using space depth cnn for real-time applications. In: Machine learning and artificial intelligence, pp 49–55. IOS Press

  62. Mehta S, Rastegari M (2021) Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. ICLR 2022

  63. Papa L, Russo P, Amerini I (2023) Meter: a mobile vision transformer architecture for monocular depth estimation. IEEE Transactions on Circuits and Systems for Video Technology

Download references

Author information

Authors and Affiliations

Authors

Contributions

DouZiWen: conceptualization, methodology, validation, investigation, writing; YeDong: supervision; LiYuQi: data curation;

Corresponding author

Correspondence to Ye Dong.

Ethics declarations

Conflict of Interest

No potential conflict of interest was reported by the authors.

Competing Interests

The authors did not receive support from any organization for the submitted work.

Ethical and informed consent for data used

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

ZiWen, D., YuQi, L. & Dong, Y. FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge. Appl Intell 53, 24566–24586 (2023). https://doi.org/10.1007/s10489-023-04872-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04872-2

Keywords

Navigation