Skip to main content
Log in

CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images

  • Published:
International Journal of Automotive Technology Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have been successful for tasks such as object detection; however, they involve time-consuming processes. Therefore, there are difficulties in applying these CNNs to autonomous driving. Moreover, most autonomous driving technologies require both object detection and distance prediction. However, CNNs that predict distance involve more time-consuming processes than object detection models. In addition, the applications for autonomous driving require object detection and distance prediction accuracy. This paper proposes an end-to-end trainable CNN that can meet these requirements. The proposed CNN accurately implements object detection and distance prediction in real time using stereo images. We demonstrate the superiority of the proposed CNN using stereo images from the KITTI 3D object detection dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aich, S., Vianney, J. M. U., Islam, M. A., Kaur, M. and Liu, B. (2021). Bidirectional attention network for monocular depth estimation. arXiv: 2009.00743.

  • Bochkovskiy, A., Wang, C. Y. and Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934.

  • Chang, J. R. and Chen, Y. S. (2018). Pyramid stereo matching network. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K. and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Miami Beach, Florida, USA.

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J. and Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. Int. J. Computer Vision, 88, 303–338.

    Article  Google Scholar 

  • Geiger, A., Lenz, P. and R. Urtasun (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA.

  • Girshick, R. (2015) Fast R-CNN IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA.

  • Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA.

  • Guo, X., Yang, K., Yang, W., Wang, X. and Li, H. (2019). Group-wise correlation stereo network. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.

  • Han, J., Heo, O., Park, M., Kee, S. and Sunwoo, M. (2016). Vehicle distance estimation using a mono-camera for FCW/AEB Systems. Int. J. Automotive Technology 17, 3, 483–491.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. arXiv: 1512.03385.

  • Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Int. Conf. Machine Learning (ICML), Lille, France.

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A. and Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea.

  • Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv: 1412.6980.

  • Königshof, H., Salscheider, N. O. and Stiller. C. (2019). Realtime 3D object detection for automated driving using stereo vision and semantic information. IEEE Intelligent Transportation Systems Conf. (ITSC), Auckland, New Zealand.

  • Li, P., Chen, X. and Shen, S. (2019). Stereo R-CNN based 3D object detection for autonomous driving. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.

  • Li, P., Su, S. and Zhao, H. (2020). RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving. arXiv: 2012.15072.

  • Li, P., Zhao, H., Liu, P. and Cao, F. (2020). RTM3D: Real-time monocular 3D Detection from object keypoints for autonomous driving. European Conf. Computer Vision (ECCV), Glasgow, UK.

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. and Berg, A. C. (2016). SSD: Single shot multibox detector. European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands.

  • Liu, Y. Wang, L. and Liu, M. (2021). YOLOStereo3D: A step back to 2D for efficient stereo 3D detection. IEEE Int. Conf. Robotics and Automation (ICRA), Xi’an, China.

  • Liu, Y., Yixuan, Y. and Liu, M. (2021). Ground-aware monocular 3D object detection for autonomous driving. IEEE Robotics and Automation Letters 6, 2, 919–926.

    Article  Google Scholar 

  • Masoumian, A., Marei, D. G. F., Abdulwahab, S., Cristiano, J., Puig, D. and Rashwan, H. A. (2021). Absolute distance prediction based on deep learning object detection and monocular depth estimation models. 23rd Int. Conf. Catalan Association for Artificial Intelligence (CCIA), Lleida, Spain.

  • Mauri, A., Khemmar, R., Decoux, B., Ragot, N., Rossi, R., Trabelsi, R., Boutteau, R., Ertaud, J. Y. and Savatier, X. (2020). Deep learning for real-time 3D multi-object detection, localization, and tracking: Application to smart mobility. Sensors 20, 2, 532.

    Article  Google Scholar 

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy A. and Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA.

  • Misra, D. (2020). Mish: A self regularized non-monotonic activation function. arXiv: 1908.08681.

  • Park, J. M. and Lee, J. W. (2022). Improved stereo matching accuracy based on selective backpropagation and extended cost volume. Int. J. Control, Automation and Systems 20, 6, 2043–2053.

    Article  Google Scholar 

  • Peng, W., Pan, H., Liu, H. and Sun, Y. (2020). IDA-3D: Instance-depth-aware 3D object detection from stereo vision for autonomous driving. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA.

  • Pytorch-YOLOv4 (2020). https://github.com/Tianxiaomo/pytorch-YOLOv4.

  • Redmon, J. and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv: 1804.02767.

  • Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016). You only look once: Unified, real-time object detection. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA.

  • Ren, S., He, K., Girshick, R. and Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv: 1506.01497.

  • Shi, S., Wang, X. and Li, H. (2019). PointRCNN: 3D object proposal generation and detection from point cloud. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.

  • Shorten, C. and Khoshgoftaar T. M. (2019). A survey on image data augmentation for deep learning. J. Big Data 6, 1, 1–48.

    Article  Google Scholar 

  • Tan, M., Pang, R. and Le, Q. V. (2020). EfficientDet: Scalable and efficient object detection. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA.

  • Vajgl, M., Hurtik P. and Nejezchleba T. (2022). Dist-YOLO: Fast object detection with distance estimation. Applied sciences 12, 3, 1–13.

    Article  Google Scholar 

  • Wang, H. M., Lin, H. Y. and Chang, C. C. (2021). Objection and depth estimation approach based on deep convolution neural networks. Sensors 21, 14, 1–17.

    Article  Google Scholar 

  • Yuan, W., Gu, X., Dai, Z., Zhu, S. and Tan, P. (2022). NeW CRFs: Neural window fully-connected CRFs for monocular depth estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Chongqing, China.

  • Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J. and Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea.

  • Yurtsever, E., Lambert, J., Carballo, A. and Takeda, K. (2020). A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, 8, 58443–58469.

    Article  Google Scholar 

  • Zaarane, A., Slimani, I., Al Okaish, W., Atouf, I. and Hamdoun, A. (2020). Distance measurement system for autonomous vehicles using stereo camera. Array, 5, 100016.

    Article  Google Scholar 

  • Zbontar, J. and LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. J. Machine Learning Research 17, 1, 2287–2318.

    MATH  Google Scholar 

  • Zheng, W., Tang, W., Chen, S., Jiang, L. and Fu, C. W. (2021). CIA-SSD: Confident IoU-aware single-stage object detector from point cloud. arXiv: 2012.03015.

  • Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020). Distance-IoU loss: Faster and better learning for bounding box regression. arXiv: 1911.08287.

Download references

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B02014422).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joon Woong Lee.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, J.G., Lee, J.W. CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images. Int.J Automot. Technol. 24, 773–786 (2023). https://doi.org/10.1007/s12239-023-0064-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12239-023-0064-z

Key Words

Navigation