Towards unified on-road object detection and depth estimation from a single image

Lian, Guofei; Wang, Yan; Qin, Huabiao; Chen, Guancheng

doi:10.1007/s13042-021-01444-z

Towards unified on-road object detection and depth estimation from a single image

Original Article
Published: 10 October 2021

Volume 13, pages 1231–1241, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Guofei Lian¹^na1,
Yan Wang¹^na1,
Huabiao Qin ORCID: orcid.org/0000-0002-8241-3639¹ &
…
Guancheng Chen¹

712 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

On-road object detection based on convolutional neural network (CNN) is an important problem in the field of automatic driving. However, traditional 2D object detection aims to accomplish object classification and location in image space, lacking the ability to acquire the depth information. Besides, it is inefficient to cascade the object detection and monocular depth estimation network for realizing 2.5D object detection. To address this problem, we propose a unified multi-task learning mechanism of object detection and depth estimation. Firstly, we propose an innovative loss function, namely projective consistency loss, which uses the perspective projection principle to model the transformation relationship between the target size and the depth value. Therefore, the object detection task and the depth estimation task can be mutually constrained. Then, we propose a global multi-scale feature extracting scheme by combining the Global Context (GC) and Atrous Spatial Pyramid Pooling (ASPP) block in an appropriate way, which can promote effective feature learning and collaborative learning between object detection and depth estimation. Comprehensive experiments conducted on KITTI and Cityscapes dataset show that our approach achieves high mAP and low distance estimation error, outperforming other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object Detection with Depth Information in Road Scenes

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Multi-scale Lightweight Neural Network for Real-Time Object Detection

References

Unitbox: an advanced object detection network. ACM (2016)
Barzegar S, Sharifi A, Manthouri M (2020) Super-resolution using lightweight detailnet network. Multimed Tools Appl 79(1):1119–1136
Article Google Scholar
Brenner E, Smeets JB (2018) Depth perception. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience 2:1–30
Caiwu L, Fan Q, Shunling R (2020) An open-pit mine roadway obstacle warning method integrating the object detection and distance threshold model. Opto-Electron Eng 47(1):190161
Google Scholar
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops
Chen F, Hong B (2005) Object detecting method based on background image difference using dynamic threshold. J Harbin Inst Technol 7
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. CoRR arXiv:abs/1706.05587
Collado JM, Hilario C, de la Escalera A, Armingol JM (2004) Model based vehicle detection for intelligent vehicles. In: IEEE Intelligent Vehicles Symposium 2004, pp 572–577
Dijk, T.v, Croon, G.d (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. MIT Press, Cambridge
Google Scholar
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. CoRR arXiv:abs/1406.2283
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Gao Y, Guo S, Huang K, Chen J, Gong Q, Zou Y, Bai T, Overett G (2017) Scale optimization for full-image-cnn vehicle detection. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp 785–791. https://doi.org/10.1109/IVS.2017.7995812
Garg R, BG, VK, Carneiro G, Reid I, (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: Computer Vision—ECCV 2016, pp 740–756
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361
Ghlert N, Jourdan N, Cordts M, Franke U, Denzler J (2020) Cityscapes 3d: dataset and benchmark for 9 dof vehicle detection
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Liu S , Di H , Wang Y (2017) Receptive Field Block Net for Accurate and Fast Object Detection
Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision—ECCV 2016, Cham, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. CoRR arXiv:abs/1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:91–99
Google Scholar
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Xu Y, He P (2019) Yolov3 vehicle detection algorithm with improved loss function. Inform Commun 12:4–7
Google Scholar
Zhang Z, He T, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of freebies for training object detection neural networks. CoRR https://arxiv.org/abs/1902.04103
Zhang Z, Wang H, Ji Z, Wei Y (2018) A vehicle real-time detection algorithm based on yolov2 framework. In: Real-time Image & Video Processing

Download references

Author information

Guofei Lian and Yan Wang have contributed equally to this work.

Authors and Affiliations

School of Electronic and Information Engineering, South China University of Technology, Guangzhou, 510641, China
Guofei Lian, Yan Wang, Huabiao Qin & Guancheng Chen

Authors

Guofei Lian
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huabiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Guancheng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huabiao Qin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lian, G., Wang, Y., Qin, H. et al. Towards unified on-road object detection and depth estimation from a single image. Int. J. Mach. Learn. & Cyber. 13, 1231–1241 (2022). https://doi.org/10.1007/s13042-021-01444-z

Download citation

Received: 17 February 2021
Accepted: 01 August 2021
Published: 10 October 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s13042-021-01444-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards unified on-road object detection and depth estimation from a single image

Abstract

Access this article

Similar content being viewed by others

Object Detection with Depth Information in Road Scenes

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Multi-scale Lightweight Neural Network for Real-Time Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards unified on-road object detection and depth estimation from a single image

Abstract

Access this article

Similar content being viewed by others

Object Detection with Depth Information in Road Scenes

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Multi-scale Lightweight Neural Network for Real-Time Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation