Skip to main content
Log in

Towards unified on-road object detection and depth estimation from a single image

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

On-road object detection based on convolutional neural network (CNN) is an important problem in the field of automatic driving. However, traditional 2D object detection aims to accomplish object classification and location in image space, lacking the ability to acquire the depth information. Besides, it is inefficient to cascade the object detection and monocular depth estimation network for realizing 2.5D object detection. To address this problem, we propose a unified multi-task learning mechanism of object detection and depth estimation. Firstly, we propose an innovative loss function, namely projective consistency loss, which uses the perspective projection principle to model the transformation relationship between the target size and the depth value. Therefore, the object detection task and the depth estimation task can be mutually constrained. Then, we propose a global multi-scale feature extracting scheme by combining the Global Context (GC) and Atrous Spatial Pyramid Pooling (ASPP) block in an appropriate way, which can promote effective feature learning and collaborative learning between object detection and depth estimation. Comprehensive experiments conducted on KITTI and Cityscapes dataset show that our approach achieves high mAP and low distance estimation error, outperforming other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Unitbox: an advanced object detection network. ACM (2016)

  2. Barzegar S, Sharifi A, Manthouri M (2020) Super-resolution using lightweight detailnet network. Multimed Tools Appl 79(1):1119–1136

    Article  Google Scholar 

  3. Brenner E, Smeets JB (2018) Depth perception. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2:1–30

  4. Caiwu L, Fan Q, Shunling R (2020) An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model. Opto-Electron Eng 47(1):190161

    Google Scholar 

  5. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops

  6. Chen F, Hong B (2005) Object detecting method based on background image difference using dynamic threshold. J Harbin Inst Technol 7

  7. Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. CoRR arXiv:abs/1706.05587

  8. Collado JM, Hilario C, de la Escalera A, Armingol JM (2004) Model based vehicle detection for intelligent vehicles. In: IEEE Intelligent Vehicles Symposium 2004, pp 572–577

  9. Dijk, T.v, Croon, G.d (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  10. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)

  11. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press, Cambridge

    Google Scholar 

  12. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. CoRR arXiv:abs/1406.2283

  13. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  14. Gao Y, Guo S, Huang K, Chen J, Gong Q, Zou Y, Bai T, Overett G (2017) Scale optimization for full-image-cnn vehicle detection. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp 785–791. https://doi.org/10.1109/IVS.2017.7995812

  15. Garg R, BG, VK, Carneiro G, Reid I, (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Computer Vision—ECCV 2016, pp 740–756

  16. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361

  17. Ghlert N, Jourdan N, Cordts M, Franke U, Denzler J (2020) Cityscapes 3d: dataset and benchmark for 9 dof vehicle detection

  18. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)

  19. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  20. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)

  21. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  22. Liu S , Di H , Wang Y (2017) Receptive Field Block Net for Accurate and Fast Object Detection

  23. Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)

  24. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision—ECCV 2016, Cham, pp 21–37

  25. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  26. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  27. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. CoRR arXiv:abs/1804.02767

  28. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99

    Google Scholar 

  29. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  30. Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  31. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  32. Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

  33. Xu Y, He P (2019) Yolov3 vehicle detection algorithm with improved loss function. Inform Commun 12:4–7

    Google Scholar 

  34. Zhang Z, He T, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of freebies for training object detection neural networks. CoRR https://arxiv.org/abs/1902.04103

  35. Zhang Z, Wang H, Ji Z, Wei Y (2018) A vehicle real-time detection algorithm based on yolov2 framework. In: Real-time Image & Video Processing

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huabiao Qin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lian, G., Wang, Y., Qin, H. et al. Towards unified on-road object detection and depth estimation from a single image. Int. J. Mach. Learn. & Cyber. 13, 1231–1241 (2022). https://doi.org/10.1007/s13042-021-01444-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01444-z

Keywords

Navigation