Abstract
The perception system in autonomous driving mainly uses object detection algorithms to obtain the distribution of obstacles for recognition and analysis. Current object detection algorithms have rapidly developed, but it is challenging to balance the requirements of real-time detection and high detection accuracy in actual application scenarios. To solve the above problems, this paper uses YOLOv8n as the baseline model and proposes an object detection network named SES-YOLOv8n. Firstly, the SPPF module in the network was replaced by the SPPCSPC module to enhance further the model’s fusion ability under feature maps of different scales. The efficient multi-scale attention module EMA is introduced into the C2F module of the backbone network, which improves the perception ability in critical areas and the efficiency of feature extraction. Finally, the SPD-Conv module is used to replace part of the convolution modules in the backbone network to replace the downsampling operation, which can more effectively retain the feature information and improve the network’s accuracy and learning ability. Experimental results on the KITTI dataset and BDD100K dataset show that the average accuracy of the improved network model reaches 92.7% and 41.9%, which is 3.4% and 5.0% higher than that of the baseline model and is significantly better than the baseline model. This model can realize real-time image processing in general scenes based on ensuring high detection accuracy.
Similar content being viewed by others
Data availability
Image dataset used in this research is available online.
References
Wang, F., Wang, P., Zhang, X., Li, H., Himed, B.J.I.A.: An overview of parametric modeling and methods for radar target detection with limited data. IEEE Access (2021). https://doi.org/10.1109/ACCESS.2021.3074063
Zhang, Y., Zhang, W., Bi, J.: Recent advances in driverless car. Recent Pat. Mech. Eng. 10(1), 30–38 (2017)
Milford, M., Anthony, S., Scheirer, W.: Self-driving vehicles: key technical challenges and progress off the road. IEEE Potentials (2019). https://doi.org/10.1109/MPOT.2019.2939376
O'Shea, K., & Nash, R.: An introduction to convolutional neural networks. (2015). https://doi.org/10.48550/arXiv.1511.08458
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015). https://doi.org/10.1109/TPAMI.2015.2437384
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017). https://doi.org/10.48550/arXiv.1703.06870
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. (2018). https://doi.org/10.48550/arXiv.1804.02767
Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of Yolo algorithm developments. Procedia Comput Sci (2022). https://doi.org/10.1016/j.procs.2022.01.135
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988. https://doi.org/10.48550/arXiv.1708.02002
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-nition (2014). https://doi.org/10.48550/arXiv.1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019). https://doi.org/10.48550/arXiv.1905.11946
Tan, M., Le, Q.: Efficientnetv2: smaller models and faster training. In: International Conference on Machine Learning, pp. 10096–10106. PMLR (2021)
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019). https://doi.org/10.48550/arXiv.1905.02244
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141. https://doi.org/10.48550/arXiv.1709.01507
Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu, T., Lu, L., Li, H.: Internimage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14408–14419 (2023). https://doi.org/10.48550/arXiv.2211.05778
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021). https://doi.org/10.48550/arXiv.2103.14030
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.48550/arXiv.1807.06521
Zhu, Y., Newsam, S.: Densenet for dense flow. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 790–794. IEEE (2017).https://doi.org/10.1109/ICIP.2017.8296389
Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network. In: Proceedings of the Asian conference on computer vision, pp. 1161–1177 (2022). https://doi.org/10.48550/arXiv.2105.14447
Pan, X., Ge, C., Lu, R., Song, S., Chen, G., Huang, Z., Huang, G.: On the integration of self-attention and convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–825 (2022). https://doi.org/10.48550/arXiv.2111.14556
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: BiFormer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023). https://doi.org/10.48550/arXiv.2303.08810
Liu, J.-J., Hou, Q., Cheng, M.-M., Feng, J., Jiang, J: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3926 (2019)
Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11653–11660 (2020)
Kim, J.U., Kwon, J., Kim, H.G., Ro, Y.M.: BBC net: bounding-box critic network for occlusion-robust object detection. IEEE Trans. Circuits Syst. Video Technol. (2019). https://doi.org/10.1109/TCSVT.2019.2900709
Xi, X., Wang, J., Li, F., Li, D.J.E.: IRSDet: Infrared small-object detection network based on sparse-skip connection and guide maps. Electronics (2022). https://doi.org/10.3390/electronics11142154
Zhang, Y., Sun, Y., Wang, Z., Jiang, Y.J.S.: YOLOv7-RAR for Urban vehicle detection. Sensors (2023). https://doi.org/10.3390/s23041801
Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., Huang, Z.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023). https://doi.org/10.1109/ICASSP49357.2023.10096516
Sunkara, R., Luo, T.: No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 443–459. Springer (2022). https://doi.org/10.48550/arXiv.2208.03641
Cao, Y., Chen, K., Loy, C.C., Lin, D.: Prime sample attention in object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11583–11591 (2020). https://doi.org/10.48550/arXiv.1904.04821
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–747 (2023). https://doi.org/10.48550/arXiv.2207.02696
Funding
Research on Key Technologies of Intelligent Equipment for Mine Powered by Pure Clean Energy, Natural Science Foundation of Hebei Province, F2021402011.
Author information
Authors and Affiliations
Contributions
All authors have contributed their unique insights to the research concept, and after review and discussion, they unanimously approved the content of the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Y., Zhang, Y., Wang, H. et al. SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8. SIViP (2024). https://doi.org/10.1007/s11760-024-03003-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03003-9