Abstract
Convolutional neural networks (CNNs) have been successful for tasks such as object detection; however, they involve time-consuming processes. Therefore, there are difficulties in applying these CNNs to autonomous driving. Moreover, most autonomous driving technologies require both object detection and distance prediction. However, CNNs that predict distance involve more time-consuming processes than object detection models. In addition, the applications for autonomous driving require object detection and distance prediction accuracy. This paper proposes an end-to-end trainable CNN that can meet these requirements. The proposed CNN accurately implements object detection and distance prediction in real time using stereo images. We demonstrate the superiority of the proposed CNN using stereo images from the KITTI 3D object detection dataset.
Similar content being viewed by others
References
Aich, S., Vianney, J. M. U., Islam, M. A., Kaur, M. and Liu, B. (2021). Bidirectional attention network for monocular depth estimation. arXiv: 2009.00743.
Bochkovskiy, A., Wang, C. Y. and Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934.
Chang, J. R. and Chen, Y. S. (2018). Pyramid stereo matching network. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K. and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Miami Beach, Florida, USA.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J. and Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. Int. J. Computer Vision, 88, 303–338.
Geiger, A., Lenz, P. and R. Urtasun (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA.
Girshick, R. (2015) Fast R-CNN IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA.
Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA.
Guo, X., Yang, K., Yang, W., Wang, X. and Li, H. (2019). Group-wise correlation stereo network. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.
Han, J., Heo, O., Park, M., Kee, S. and Sunwoo, M. (2016). Vehicle distance estimation using a mono-camera for FCW/AEB Systems. Int. J. Automotive Technology 17, 3, 483–491.
He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep residual learning for image recognition. arXiv: 1512.03385.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Int. Conf. Machine Learning (ICML), Lille, France.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A. and Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea.
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv: 1412.6980.
Königshof, H., Salscheider, N. O. and Stiller. C. (2019). Realtime 3D object detection for automated driving using stereo vision and semantic information. IEEE Intelligent Transportation Systems Conf. (ITSC), Auckland, New Zealand.
Li, P., Chen, X. and Shen, S. (2019). Stereo R-CNN based 3D object detection for autonomous driving. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.
Li, P., Su, S. and Zhao, H. (2020). RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving. arXiv: 2012.15072.
Li, P., Zhao, H., Liu, P. and Cao, F. (2020). RTM3D: Real-time monocular 3D Detection from object keypoints for autonomous driving. European Conf. Computer Vision (ECCV), Glasgow, UK.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. and Berg, A. C. (2016). SSD: Single shot multibox detector. European Conf. Computer Vision (ECCV), Amsterdam, The Netherlands.
Liu, Y. Wang, L. and Liu, M. (2021). YOLOStereo3D: A step back to 2D for efficient stereo 3D detection. IEEE Int. Conf. Robotics and Automation (ICRA), Xi’an, China.
Liu, Y., Yixuan, Y. and Liu, M. (2021). Ground-aware monocular 3D object detection for autonomous driving. IEEE Robotics and Automation Letters 6, 2, 919–926.
Masoumian, A., Marei, D. G. F., Abdulwahab, S., Cristiano, J., Puig, D. and Rashwan, H. A. (2021). Absolute distance prediction based on deep learning object detection and monocular depth estimation models. 23rd Int. Conf. Catalan Association for Artificial Intelligence (CCIA), Lleida, Spain.
Mauri, A., Khemmar, R., Decoux, B., Ragot, N., Rossi, R., Trabelsi, R., Boutteau, R., Ertaud, J. Y. and Savatier, X. (2020). Deep learning for real-time 3D multi-object detection, localization, and tracking: Application to smart mobility. Sensors 20, 2, 532.
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy A. and Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA.
Misra, D. (2020). Mish: A self regularized non-monotonic activation function. arXiv: 1908.08681.
Park, J. M. and Lee, J. W. (2022). Improved stereo matching accuracy based on selective backpropagation and extended cost volume. Int. J. Control, Automation and Systems 20, 6, 2043–2053.
Peng, W., Pan, H., Liu, H. and Sun, Y. (2020). IDA-3D: Instance-depth-aware 3D object detection from stereo vision for autonomous driving. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA.
Pytorch-YOLOv4 (2020). https://github.com/Tianxiaomo/pytorch-YOLOv4.
Redmon, J. and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv: 1804.02767.
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016). You only look once: Unified, real-time object detection. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA.
Ren, S., He, K., Girshick, R. and Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv: 1506.01497.
Shi, S., Wang, X. and Li, H. (2019). PointRCNN: 3D object proposal generation and detection from point cloud. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA.
Shorten, C. and Khoshgoftaar T. M. (2019). A survey on image data augmentation for deep learning. J. Big Data 6, 1, 1–48.
Tan, M., Pang, R. and Le, Q. V. (2020). EfficientDet: Scalable and efficient object detection. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA.
Vajgl, M., Hurtik P. and Nejezchleba T. (2022). Dist-YOLO: Fast object detection with distance estimation. Applied sciences 12, 3, 1–13.
Wang, H. M., Lin, H. Y. and Chang, C. C. (2021). Objection and depth estimation approach based on deep convolution neural networks. Sensors 21, 14, 1–17.
Yuan, W., Gu, X., Dai, Z., Zhu, S. and Tan, P. (2022). NeW CRFs: Neural window fully-connected CRFs for monocular depth estimation. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Chongqing, China.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J. and Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea.
Yurtsever, E., Lambert, J., Carballo, A. and Takeda, K. (2020). A survey of autonomous driving: Common practices and emerging technologies. IEEE Access, 8, 58443–58469.
Zaarane, A., Slimani, I., Al Okaish, W., Atouf, I. and Hamdoun, A. (2020). Distance measurement system for autonomous vehicles using stereo camera. Array, 5, 100016.
Zbontar, J. and LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. J. Machine Learning Research 17, 1, 2287–2318.
Zheng, W., Tang, W., Chen, S., Jiang, L. and Fu, C. W. (2021). CIA-SSD: Confident IoU-aware single-stage object detector from point cloud. arXiv: 2012.03015.
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020). Distance-IoU loss: Faster and better learning for bounding box regression. arXiv: 1911.08287.
Acknowledgement
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B02014422).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, J.G., Lee, J.W. CNN-Based Object Detection and Distance Prediction for Autonomous Driving Using Stereo Images. Int.J Automot. Technol. 24, 773–786 (2023). https://doi.org/10.1007/s12239-023-0064-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12239-023-0064-z