Skip to main content
Log in

SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The perception system in autonomous driving mainly uses object detection algorithms to obtain the distribution of obstacles for recognition and analysis. Current object detection algorithms have rapidly developed, but it is challenging to balance the requirements of real-time detection and high detection accuracy in actual application scenarios. To solve the above problems, this paper uses YOLOv8n as the baseline model and proposes an object detection network named SES-YOLOv8n. Firstly, the SPPF module in the network was replaced by the SPPCSPC module to enhance further the model’s fusion ability under feature maps of different scales. The efficient multi-scale attention module EMA is introduced into the C2F module of the backbone network, which improves the perception ability in critical areas and the efficiency of feature extraction. Finally, the SPD-Conv module is used to replace part of the convolution modules in the backbone network to replace the downsampling operation, which can more effectively retain the feature information and improve the network’s accuracy and learning ability. Experimental results on the KITTI dataset and BDD100K dataset show that the average accuracy of the improved network model reaches 92.7% and 41.9%, which is 3.4% and 5.0% higher than that of the baseline model and is significantly better than the baseline model. This model can realize real-time image processing in general scenes based on ensuring high detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Image dataset used in this research is available online.

References

  1. Wang, F., Wang, P., Zhang, X., Li, H., Himed, B.J.I.A.: An overview of parametric modeling and methods for radar target detection with limited data. IEEE Access (2021). https://doi.org/10.1109/ACCESS.2021.3074063

    Article  Google Scholar 

  2. Zhang, Y., Zhang, W., Bi, J.: Recent advances in driverless car. Recent Pat. Mech. Eng. 10(1), 30–38 (2017)

    Article  Google Scholar 

  3. Milford, M., Anthony, S., Scheirer, W.: Self-driving vehicles: key technical challenges and progress off the road. IEEE Potentials (2019). https://doi.org/10.1109/MPOT.2019.2939376

    Article  Google Scholar 

  4. O'Shea, K., & Nash, R.: An introduction to convolutional neural networks. (2015). https://doi.org/10.48550/arXiv.1511.08458

  5. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015). https://doi.org/10.1109/TPAMI.2015.2437384

    Article  Google Scholar 

  6. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  7. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017). https://doi.org/10.48550/arXiv.1703.06870

  9. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. (2018). https://doi.org/10.48550/arXiv.1804.02767

  10. Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of Yolo algorithm developments. Procedia Comput Sci (2022). https://doi.org/10.1016/j.procs.2022.01.135

    Article  Google Scholar 

  11. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  12. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)

  13. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988. https://doi.org/10.48550/arXiv.1708.02002

  14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-nition (2014). https://doi.org/10.48550/arXiv.1409.1556

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  16. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019). https://doi.org/10.48550/arXiv.1905.11946

  17. Tan, M., Le, Q.: Efficientnetv2: smaller models and faster training. In: International Conference on Machine Learning, pp. 10096–10106. PMLR (2021)

  18. Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019). https://doi.org/10.48550/arXiv.1905.02244

  19. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141. https://doi.org/10.48550/arXiv.1709.01507

  20. Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu, T., Lu, L., Li, H.: Internimage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14408–14419 (2023). https://doi.org/10.48550/arXiv.2211.05778

  21. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)

  22. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021). https://doi.org/10.48550/arXiv.2103.14030

  23. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  24. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018). https://doi.org/10.48550/arXiv.1807.06521

  25. Zhu, Y., Newsam, S.: Densenet for dense flow. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 790–794. IEEE (2017).https://doi.org/10.1109/ICIP.2017.8296389

  26. Zhang, H., Zu, K., Lu, J., Zou, Y., Meng, D.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network. In: Proceedings of the Asian conference on computer vision, pp. 1161–1177 (2022). https://doi.org/10.48550/arXiv.2105.14447

  27. Pan, X., Ge, C., Lu, R., Song, S., Chen, G., Huang, Z., Huang, G.: On the integration of self-attention and convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 815–825 (2022). https://doi.org/10.48550/arXiv.2111.14556

  28. Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: BiFormer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023). https://doi.org/10.48550/arXiv.2303.08810

  29. Liu, J.-J., Hou, Q., Cheng, M.-M., Feng, J., Jiang, J: A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3926 (2019)

  30. Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: Cbnet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11653–11660 (2020)

  31. Kim, J.U., Kwon, J., Kim, H.G., Ro, Y.M.: BBC net: bounding-box critic network for occlusion-robust object detection. IEEE Trans. Circuits Syst. Video Technol. (2019). https://doi.org/10.1109/TCSVT.2019.2900709

    Article  Google Scholar 

  32. Xi, X., Wang, J., Li, F., Li, D.J.E.: IRSDet: Infrared small-object detection network based on sparse-skip connection and guide maps. Electronics (2022). https://doi.org/10.3390/electronics11142154

    Article  Google Scholar 

  33. Zhang, Y., Sun, Y., Wang, Z., Jiang, Y.J.S.: YOLOv7-RAR for Urban vehicle detection. Sensors (2023). https://doi.org/10.3390/s23041801

    Article  Google Scholar 

  34. Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., Huang, Z.: Efficient multi-scale attention module with cross-spatial learning. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023). https://doi.org/10.1109/ICASSP49357.2023.10096516

  35. Sunkara, R., Luo, T.: No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 443–459. Springer (2022). https://doi.org/10.48550/arXiv.2208.03641

  36. Cao, Y., Chen, K., Loy, C.C., Lin, D.: Prime sample attention in object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11583–11591 (2020). https://doi.org/10.48550/arXiv.1904.04821

  37. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–747 (2023). https://doi.org/10.48550/arXiv.2207.02696

Download references

Funding

Research on Key Technologies of Intelligent Equipment for Mine Powered by Pure Clean Energy, Natural Science Foundation of Hebei Province, F2021402011.

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed their unique insights to the research concept, and after review and discussion, they unanimously approved the content of the final manuscript.

Corresponding author

Correspondence to Yuhang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Zhang, Y., Wang, H. et al. SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8. SIViP (2024). https://doi.org/10.1007/s11760-024-03003-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03003-9

Keywords

Navigation