Object Detection with Depth Information in Road Scenes

Liu, Ruowang; Chen, Xinbo; Tao, Bo

doi:10.1007/978-981-99-8021-5_15

Ruowang Liu⁹,
Xinbo Chen¹⁰ &
Bo Tao^11,12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1919))

Included in the following conference series:

International Conference on Cognitive Systems and Signal Processing

127 Accesses

Abstract

In recent years, depth estimation has witnessed significant advancements because of the development of deep learning. It's important to note that depth estimation tasks focus solely on predicting the depth of each pixel in an image and do not include object detection or object recognition. Depth estimation is the use of pixel transformations in the image to obtain distance information from each point in the scene to the camera to generate a depth map. Object detection is the process of classifying and localizing an image, given a picture, so as to identify the objects in the picture and determine their location. To overcome this limitation and integrate object detection into the depth estimation process, this paper proposes a novel self-supervised monocular depth estimation algorithm that leverages an attention mechanism. By combining object detection and depth estimation, a real-time multi-task model is designed to enable simultaneous detection and depth estimation of objects. The framework comprises four essential components: an object detection sub-network, a depth estimation sub-network, a lateral sharing unit, and an attention loss. These components work collaboratively to enhance distance estimation accuracy for objects and improve the object detection performance. Throughout experiments, it is evident that the proposed approach can effectively estimate distances to objects and enhances the accuracy of object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lertrusdachakul, I., Fougerolle, Y.D., Laligant. O.: Dynamic (de)focused projection for three-dimensional reconstruction. Optical Eng. 50(11): 113201–113201–11 (2011)
Google Scholar
Sun, M.J., Edgar, M.P., Gibson, G.M., et al.: Single-pixel three-dimensional imaging with time-based depth resolution. Nat. Commun.Commun. 7(1), 12010 (2016)
Article Google Scholar
Gonzalez-Romo, N.I., Hanalioglu, S., Mignucci-Jiménez, G., et al.: Anatomic depth estimation and three-dimensional reconstruction of microsurgical anatomy using monoscopic high-definition photogrammetry and machine learning. Operative Neurosur. 10, 1227 (2022)
Google Scholar
Chen, P.Y., Liu, A.H., Liu, Y.C., et al.: Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2624–2632 (2019)
Google Scholar
Ren, H., El-Khamy, M., Lee, J.: Deep robust single image depth estimation neural network using scene understanding. In: CVPR Workshops, vol. 2, p. 2 (2019)
Google Scholar
Aguilar, W.G., Quisaguano, F.J., Rodríguez, G.A., Alvarez, L.G., Limaico, A., Sandoval, D.S.: Convolutional neuronal networks based monocular object detection and depth perception for micro UAVs. In: Peng, Y., Kai, Y., Jiwen, L., Jiang, X. (eds.) Intelligence Science and Big Data Engineering: 8th International Conference, IScIDE 2018, Lanzhou, China, 18–19 August 2018, Revised Selected Papers, pp. 401–410. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-02698-1_35
Miclea, V.C., Nedevschi, S.: Monocular depth estimation with improved long-range accuracy for UAV environment perception. IEEE Trans. Geosci. Remote Sens.Geosci. Remote Sens. 60, 1–15 (2021)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1 pp. I-I. IEEE (2001)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Google Scholar
He, K,, Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Wang C.Y., Liao, H.Y.M., Wu, Y.H., et al.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)
Google Scholar
Girshick, R., Donahue, J,, Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Girshick, R:.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Lin, T.Y, Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Hev. K., Gkioxari, G., Dollár, P., et al:. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J,, Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804. 02767 (2018)
Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection, vol. 2004, p. 10934 (2020)
Google Scholar
Li, A., Sun, S., Zhang, Z., et al.: A multi-scale traffic object detection algorithm for road scenes based on improved YOLOv5. Electronics 12(4), 878 (2023)
Google Scholar
Reading, C., Harakeh, A., Chae, J., et al.: Categorical depth distribution network for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8555–8564 (2021)
Google Scholar
Khan, F., Salahuddin, S., Javidnia, H.: Deep learning-based monocular depth estimation methods—A state-of-the-art review. Sensors 20(8), 2272 (2020)
Article Google Scholar
Bugby, S.L., Lees, J.E., McKnight, W.K., et al.: Stereoscopic portable hybrid gamma imaging for source depth estimation. Phys. Med. Biol. 66(4), 045031 (2021)
Google Scholar
Praveen, S.: Efficient depth estimation using sparse stereo-vision with other perception techniques. Coding Theory 111 (2020)
Google Scholar
Li, B., Shen, C., Dai, Y., et al.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1119–1127 (2015)
Google Scholar
Qi, X., Liao, R., Liu. Z., et al.: Geonet: geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291 (2018)
Google Scholar
Sheng, F., Xue, F., Chang, Y., et al.: Monocular depth distribution alignment with low computation. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 6548–6555. IEEE (2022)
Google Scholar
Garg, R., Bg, V.K., Carneiro, G., Unsupervised, C.N.N.: For single view depth estimation: Geometry to the rescue. In: Computer Vision–ECCV 2016, 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part VIII 14, pp. 740-756. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Zhou, T., Brown, M., Snavely, N., et al.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Google Scholar
Tao, B., Chen, X., Tong, X., et al.: Self-supervised monocular depth estimation based on channel attention Photonics. MDPI 9(6), 434 (2022)
Google Scholar
Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)
Google Scholar
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, pp. 55–71. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4
Chapter Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27 (2014)
Google Scholar
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)
Google Scholar
Zou, Y., Luo, Z., Huang, J.-B.: Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part V, pp. 38–55. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_3
Chapter Google Scholar
Ranjan, A., Jampani, V., Balles, L., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12240–12249 (2019)
Google Scholar
Casser, V., Pirk, S., Mahjourian, R., et al.: Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In: Proceedings of the AAAI Conference on Artificial Intelligence , vol. 33(01), pp. 8001–8008 (2019)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., et al.: Digging into self-supervised monocular depth estimation. Ïn: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Google Scholar
Guizilini, V., Ambrus, R., Pillai, S., et al.: 3D packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2485–2494 (2020)
Google Scholar
Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4756–4765 (2020)
Google Scholar

Download references

Acknowledgments

This project is supported by the Higher education teaching reformation project of Hubei province of China (2022231, 2022216), and the graduate teaching reformation project of Wuhan University of Science and Technology (Yjg202202).

Author information

Authors and Affiliations

Key Laboratory of Metallurgical Equipment and Control Technology, Ministry of Education, Wuhan University of Science and Technology, Wuhan, China
Ruowang Liu
Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering, Wuhan University of Science and Technology, Wuhan, China
Xinbo Chen
Precision Manufacturing Institute, Wuhan University of Science and Technology, Wuhan, China
Bo Tao
Research Center for Biomimetic Robot and Intelligent Measurement and Control, Wuhan University of Science and Technology, Wuhan, China
Bo Tao

Authors

Ruowang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xinbo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bo Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Tao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fuchun Sun
Southern University of Science and Technology, Shenzhen, China
Qinghu Meng
Henan University of Science and Technology, Luoyang, China
Zhumu Fu
Tsinghua University, Beijing, China
Bin Fang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, R., Chen, X., Tao, B. (2024). Object Detection with Depth Information in Road Scenes. In: Sun, F., Meng, Q., Fu, Z., Fang, B. (eds) Cognitive Systems and Information Processing. ICCSIP 2023. Communications in Computer and Information Science, vol 1919. Springer, Singapore. https://doi.org/10.1007/978-981-99-8021-5_15

Download citation

DOI: https://doi.org/10.1007/978-981-99-8021-5_15
Published: 05 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8020-8
Online ISBN: 978-981-99-8021-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics