CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images

Song, Jin Gyu; Lee, Joon Woong

doi:10.1007/s12239-023-0064-z

CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images

Published: 12 May 2023

Volume 24, pages 773–786, (2023)
Cite this article

International Journal of Automotive Technology Aims and scope Submit manuscript

Jin Gyu Song¹ &
Joon Woong Lee¹

252 Accesses
Explore all metrics

Abstract

Convolutional neural networks (CNNs) have been successful for tasks such as object detection; however, they involve time-consuming processes. Therefore, there are difficulties in applying these CNNs to autonomous driving. Moreover, most autonomous driving technologies require both object detection and distance prediction. However, CNNs that predict distance involve more time-consuming processes than object detection models. In addition, the applications for autonomous driving require object detection and distance prediction accuracy. This paper proposes an end-to-end trainable CNN that can meet these requirements. The proposed CNN accurately implements object detection and distance prediction in real time using stereo images. We demonstrate the superiority of the proposed CNN using stereo images from the KITTI 3D object detection dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stereo Vision-Based Convolutional Networks for Object Detection in Driving Environments

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Towards unified on-road object detection and depth estimation from a single image

Article 10 October 2021

References

Aich, S., Vianney, J. M. U., Islam, M. A., Kaur, M. and Liu, B. (2021). Bidirectional attention network for monocular depth estimation. arXiv: 2009.00743.
Bochkovskiy, A., Wang, C. Y. and Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934.
Chang, J. R. and Chen, Y. S. (2018). Pyramid stereo matching network. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K. and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Miami Beach, Florida, USA.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J. and Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. Int. J. Computer Vision, 88, 303–338.
Article Google Scholar
Geiger, A., Lenz, P. and R. Urtasun (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA.
Girshick, R. (2015) Fast R-CNN IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA.
Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA.
Guo, X., Yang, K., Yang, W., Wang, X. and Li, H. (2019). Group-wise correlation stereo network. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.
Han, J., Heo, O., Park, M., Kee, S. and Sunwoo, M. (2016). Vehicle distance estimation using a mono-camera for FCW/AEB Systems. Int. J. Automotive Technology 17, 3, 483–491.
Article Google Scholar
He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. arXiv: 1512.03385.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Int. Conf. Machine Learning (ICML), Lille, France.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A. and Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea.
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv: 1412.6980.
Königshof, H., Salscheider, N. O. and Stiller. C. (2019). Realtime 3D object detection for automated driving using stereo vision and semantic information. IEEE Intelligent Transportation Systems Conf. (ITSC), Auckland, New Zealand.
Li, P., Chen, X. and Shen, S. (2019). Stereo R-CNN based 3D object detection for autonomous driving. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.
Li, P., Su, S. and Zhao, H. (2020). RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving. arXiv: 2012.15072.
Li, P., Zhao, H., Liu, P. and Cao, F. (2020). RTM3D: Real-time monocular 3D Detection from object keypoints for autonomous driving. European Conf. Computer Vision (ECCV), Glasgow, UK.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. and Berg, A. C. (2016). SSD: Single shot multibox detector. European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands.
Liu, Y. Wang, L. and Liu, M. (2021). YOLOStereo3D: A step back to 2D for efficient stereo 3D detection. IEEE Int. Conf. Robotics and Automation (ICRA), Xi’an, China.
Liu, Y., Yixuan, Y. and Liu, M. (2021). Ground-aware monocular 3D object detection for autonomous driving. IEEE Robotics and Automation Letters 6, 2, 919–926.
Article Google Scholar
Masoumian, A., Marei, D. G. F., Abdulwahab, S., Cristiano, J., Puig, D. and Rashwan, H. A. (2021). Absolute distance prediction based on deep learning object detection and monocular depth estimation models. 23rd Int. Conf. Catalan Association for Artificial Intelligence (CCIA), Lleida, Spain.
Mauri, A., Khemmar, R., Decoux, B., Ragot, N., Rossi, R., Trabelsi, R., Boutteau, R., Ertaud, J. Y. and Savatier, X. (2020). Deep learning for real-time 3D multi-object detection, localization, and tracking: Application to smart mobility. Sensors 20, 2, 532.
Article Google Scholar
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy A. and Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA.
Misra, D. (2020). Mish: A self regularized non-monotonic activation function. arXiv: 1908.08681.
Park, J. M. and Lee, J. W. (2022). Improved stereo matching accuracy based on selective backpropagation and extended cost volume. Int. J. Control, Automation and Systems 20, 6, 2043–2053.
Article Google Scholar
Peng, W., Pan, H., Liu, H. and Sun, Y. (2020). IDA-3D: Instance-depth-aware 3D object detection from stereo vision for autonomous driving. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA.
Pytorch-YOLOv4 (2020). https://github.com/Tianxiaomo/pytorch-YOLOv4.
Redmon, J. and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv: 1804.02767.
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016). You only look once: Unified, real-time object detection. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA.
Ren, S., He, K., Girshick, R. and Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv: 1506.01497.
Shi, S., Wang, X. and Li, H. (2019). PointRCNN: 3D object proposal generation and detection from point cloud. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.
Shorten, C. and Khoshgoftaar T. M. (2019). A survey on image data augmentation for deep learning. J. Big Data 6, 1, 1–48.
Article Google Scholar
Tan, M., Pang, R. and Le, Q. V. (2020). EfficientDet: Scalable and efficient object detection. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA.
Vajgl, M., Hurtik P. and Nejezchleba T. (2022). Dist-YOLO: Fast object detection with distance estimation. Applied sciences 12, 3, 1–13.
Article Google Scholar
Wang, H. M., Lin, H. Y. and Chang, C. C. (2021). Objection and depth estimation approach based on deep convolution neural networks. Sensors 21, 14, 1–17.
Article Google Scholar
Yuan, W., Gu, X., Dai, Z., Zhu, S. and Tan, P. (2022). NeW CRFs: Neural window fully-connected CRFs for monocular depth estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Chongqing, China.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J. and Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea.
Yurtsever, E., Lambert, J., Carballo, A. and Takeda, K. (2020). A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, 8, 58443–58469.
Article Google Scholar
Zaarane, A., Slimani, I., Al Okaish, W., Atouf, I. and Hamdoun, A. (2020). Distance measurement system for autonomous vehicles using stereo camera. Array, 5, 100016.
Article Google Scholar
Zbontar, J. and LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. J. Machine Learning Research 17, 1, 2287–2318.
MATH Google Scholar
Zheng, W., Tang, W., Chen, S., Jiang, L. and Fu, C. W. (2021). CIA-SSD: Confident IoU-aware single-stage object detector from point cloud. arXiv: 2012.03015.
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020). Distance-IoU loss: Faster and better learning for bounding box regression. arXiv: 1911.08287.

Download references

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B02014422).

Author information

Authors and Affiliations

Department of Industrial Engineering, Chonnam National University, Gwangju, 61186, Korea
Jin Gyu Song & Joon Woong Lee

Authors

Jin Gyu Song
View author publications
You can also search for this author in PubMed Google Scholar
Joon Woong Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joon Woong Lee.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, J.G., Lee, J.W. CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images. Int.J Automot. Technol. 24, 773–786 (2023). https://doi.org/10.1007/s12239-023-0064-z

Download citation

Received: 26 August 2022
Revised: 28 November 2022
Accepted: 21 December 2022
Published: 12 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s12239-023-0064-z

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images

Abstract

Access this article

Similar content being viewed by others

Stereo Vision-Based Convolutional Networks for Object Detection in Driving Environments

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Towards unified on-road object detection and depth estimation from a single image

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Key Words

Navigation

CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images

Abstract

Access this article

Similar content being viewed by others

Stereo Vision-Based Convolutional Networks for Object Detection in Driving Environments

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Towards unified on-road object detection and depth estimation from a single image

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation