Skip to main content

Object Detection with Depth Information in Road Scenes

  • Conference paper
  • First Online:
Cognitive Systems and Information Processing (ICCSIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1919))

Included in the following conference series:

  • 127 Accesses

Abstract

In recent years, depth estimation has witnessed significant advancements because of the development of deep learning. It's important to note that depth estimation tasks focus solely on predicting the depth of each pixel in an image and do not include object detection or object recognition. Depth estimation is the use of pixel transformations in the image to obtain distance information from each point in the scene to the camera to generate a depth map. Object detection is the process of classifying and localizing an image, given a picture, so as to identify the objects in the picture and determine their location. To overcome this limitation and integrate object detection into the depth estimation process, this paper proposes a novel self-supervised monocular depth estimation algorithm that leverages an attention mechanism. By combining object detection and depth estimation, a real-time multi-task model is designed to enable simultaneous detection and depth estimation of objects. The framework comprises four essential components: an object detection sub-network, a depth estimation sub-network, a lateral sharing unit, and an attention loss. These components work collaboratively to enhance distance estimation accuracy for objects and improve the object detection performance. Throughout experiments, it is evident that the proposed approach can effectively estimate distances to objects and enhances the accuracy of object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lertrusdachakul, I., Fougerolle, Y.D., Laligant. O.: Dynamic (de)focused projection for three-dimensional reconstruction. Optical Eng. 50(11): 113201–113201–11 (2011)

    Google Scholar 

  2. Sun, M.J., Edgar, M.P., Gibson, G.M., et al.: Single-pixel three-dimensional imaging with time-based depth resolution. Nat. Commun.Commun. 7(1), 12010 (2016)

    Article  Google Scholar 

  3. Gonzalez-Romo, N.I., Hanalioglu, S., Mignucci-Jiménez, G., et al.: Anatomic depth estimation and three-dimensional reconstruction of microsurgical anatomy using monoscopic high-definition photogrammetry and machine learning. Operative Neurosur. 10, 1227 (2022)

    Google Scholar 

  4. Chen, P.Y., Liu, A.H., Liu, Y.C., et al.: Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2624–2632 (2019)

    Google Scholar 

  5. Ren, H., El-Khamy, M., Lee, J.: Deep robust single image depth estimation neural network using scene understanding. In: CVPR Workshops, vol. 2, p. 2 (2019)

    Google Scholar 

  6. Aguilar, W.G., Quisaguano, F.J., Rodríguez, G.A., Alvarez, L.G., Limaico, A., Sandoval, D.S.: Convolutional neuronal networks based monocular object detection and depth perception for micro UAVs. In: Peng, Y., Kai, Y., Jiwen, L., Jiang, X. (eds.) Intelligence Science and Big Data Engineering: 8th International Conference, IScIDE 2018, Lanzhou, China, 18–19 August  2018, Revised Selected Papers, pp. 401–410. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-02698-1_35

  7. Miclea, V.C., Nedevschi, S.: Monocular depth estimation with improved long-range accuracy for UAV environment perception. IEEE Trans. Geosci. Remote Sens.Geosci. Remote Sens. 60, 1–15 (2021)

    Google Scholar 

  8. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1 pp. I-I. IEEE (2001)

    Google Scholar 

  9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)

    Google Scholar 

  10. He, K,, Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Wang C.Y., Liao, H.Y.M., Wu, Y.H., et al.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020)

    Google Scholar 

  12. Girshick, R., Donahue, J,, Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  13. Girshick, R:.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  14. Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  15. Lin, T.Y, Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  16. Hev. K., Gkioxari, G., Dollár, P., et al:. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  17. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  18. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

    Google Scholar 

  19. Redmon, J,, Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804. 02767 (2018)

    Google Scholar 

  20. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection, vol. 2004, p. 10934 (2020)

    Google Scholar 

  21. Li, A., Sun, S., Zhang, Z., et al.: A multi-scale traffic object detection algorithm for road scenes based on improved YOLOv5. Electronics 12(4), 878 (2023)

    Google Scholar 

  22. Reading, C., Harakeh, A., Chae, J., et al.: Categorical depth distribution network for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8555–8564 (2021)

    Google Scholar 

  23. Khan, F., Salahuddin, S., Javidnia, H.: Deep learning-based monocular depth estimation methods—A state-of-the-art review. Sensors 20(8), 2272 (2020)

    Article  Google Scholar 

  24. Bugby, S.L., Lees, J.E., McKnight, W.K., et al.: Stereoscopic portable hybrid gamma imaging for source depth estimation. Phys. Med. Biol. 66(4), 045031 (2021)

    Google Scholar 

  25. Praveen, S.: Efficient depth estimation using sparse stereo-vision with other perception techniques. Coding Theory 111 (2020)

    Google Scholar 

  26. Li, B., Shen, C., Dai, Y., et al.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1119–1127 (2015)

    Google Scholar 

  27. Qi, X., Liao, R., Liu. Z., et al.: Geonet: geometric neural network for joint depth and surface normal estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291 (2018)

    Google Scholar 

  28. Sheng, F., Xue, F., Chang, Y., et al.: Monocular depth distribution alignment with low computation. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 6548–6555. IEEE (2022)

    Google Scholar 

  29. Garg, R., Bg, V.K., Carneiro, G., Unsupervised, C.N.N.: For single view depth estimation: Geometry to the rescue. In: Computer Vision–ECCV 2016, 14th European Conference, Amsterdam, The Netherlands, 11–14 October  2016, Proceedings, Part VIII 14, pp. 740-756. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-46484-8_45

  30. Zhou, T., Brown, M., Snavely, N., et al.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

    Google Scholar 

  31. Tao, B., Chen, X., Tong, X., et al.: Self-supervised monocular depth estimation based on channel attention Photonics. MDPI 9(6), 434 (2022)

    Google Scholar 

  32. Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)

    Google Scholar 

  33. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, pp. 55–71. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4

    Chapter  Google Scholar 

  34. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27 (2014)

    Google Scholar 

  35. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)

    Google Scholar 

  36. Zou, Y., Luo, Z., Huang, J.-B.: Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part V, pp. 38–55. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_3

    Chapter  Google Scholar 

  37. Ranjan, A., Jampani, V., Balles, L., et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12240–12249 (2019)

    Google Scholar 

  38. Casser, V., Pirk, S., Mahjourian, R., et al.: Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In: Proceedings of the AAAI Conference on Artificial Intelligence , vol. 33(01), pp. 8001–8008 (2019)

    Google Scholar 

  39. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)

    Google Scholar 

  40. Godard, C., Mac Aodha, O., Firman, M., et al.: Digging into self-supervised monocular depth estimation. Ïn: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)

    Google Scholar 

  41. Guizilini, V., Ambrus, R., Pillai, S., et al.: 3D packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2485–2494 (2020)

    Google Scholar 

  42. Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4756–4765 (2020)

    Google Scholar 

Download references

Acknowledgments

This project is supported by the Higher education teaching reformation project of Hubei province of China (2022231, 2022216), and the graduate teaching reformation project of Wuhan University of Science and Technology (Yjg202202).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Tao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, R., Chen, X., Tao, B. (2024). Object Detection with Depth Information in Road Scenes. In: Sun, F., Meng, Q., Fu, Z., Fang, B. (eds) Cognitive Systems and Information Processing. ICCSIP 2023. Communications in Computer and Information Science, vol 1919. Springer, Singapore. https://doi.org/10.1007/978-981-99-8021-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8021-5_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8020-8

  • Online ISBN: 978-981-99-8021-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics