Skip to main content
Log in

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Small object detection has been a longstanding challenge in the field of object detection, and achieving high detection accuracy is crucial for autonomous driving, especially for small objects. This article focuses on researching small object detection algorithms in driving scenarios. To address the need for higher accuracy and fewer parameters in object detection for autonomous driving, we propose LSD-YOLO, a small object detection algorithm with higher average precision and fewer parameters. Building upon YOLOv5, we fully leverage small-scale feature maps to enhance the network’s detection ability for small objects. Additionally, we introduce a new structure called FasterC3 to reduce the network’s latency and parameter volume. To locate attention regions in complex driving scenarios, we integrate Coordinate Attention and explore multiple solutions to determine the optimal approach. Furthermore, we use a spatial pyramid pooling method called LeakySPPF (Wen and Zhang, in: Jin Z, Jiang Y, Buchmann RA, Bi Y, Ghiran A-M, Ma W (eds.) Knowledge Science, Engineering and Management, pp. 39-46. Springer, Cham, 2023) to further improve network speed, achieving up to 15% faster computation. Finally, to better match driving scenarios, we propose a medium-sized dataset called Cone4k to supplement insufficient categories in the VisDrone dataset. Extensive experiments show that our proposed LSD-YOLO(s) achieves an mAP and F1 score of 24.9 and 48.6, respectively, on the VisDrone2021 dataset, resulting in a 4.6% and 3.6% improvement over YOLOv5(s) while reducing parameter volume by 7.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of supporting data

All the data in this paper are obtained by our experiments and are true and effective.

References

  1. Wen Z, Su J, Zhang Y (2023) Sie-yolov5: improved yolov5 for small object detection in drone-captured-scenarios. In: Jin Z, Jiang Y, Buchmann RA, Bi Y, Ghiran A-M, Ma W (eds) Knowledge science, engineering and management. Springer, Cham, pp 39–46

    Chapter  Google Scholar 

  2. Lin T, Maire M, Belongie SJ, Bourdev LD, Girshick RB, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. CoRR abs/1405.0312 1405.0312

  3. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  4. Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303–338

    Article  Google Scholar 

  5. Chen C, Liu M-Y, Tuzel O, Xiao J (2017) R-cnn for small object detection. In: Lai S-H, Lepetit V, Nishino K, Sato Y (eds) Computer vision - ACCV 2016. Springer, Cham, pp 214–230

    Chapter  Google Scholar 

  6. Lin T, Goyal P, Girshick RB, He K, Dollár P (2017) Focal loss for dense object detection. CoRR abs/1708.02002 1708.02002

  7. Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. CoRR abs/1911.09070 1911.09070

  8. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017), Attention is all you need

  9. Redmon J, Divvala SK, Girshick RB, Farhadi A (2015), You only look once: unified, real-time object detection. CoRR abs/1506.02640 1506.02640

  10. Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. CoRR abs/1612.08242 1612.08242

  11. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. CoRR abs/1804.02767 1804.02767

  12. Bochkovskiy A, Wang C, Liao HM (2020) Yolov4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 2004.10934

  13. Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, Li Y, Zhang B, Liang Y, Zhou, L. Xu, X, Chu X, Wei X, Wei X (2022), YOLOv6: a single-stage object detection framework for industrial applications. https://doi.org/10.48550/ARXIV.2209.02976

  14. Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv . https://doi.org/10.48550/ARXIV.2207.02696

  15. Wang C-Y, Liao H-YM, Yeh I-H, Wu Y-H, Chen P-Y, Hsieh J-W (2019) CSPNet: a new backbone that can enhance learning capability of CNN

  16. Zhang Y-F, Ren W, Zhang Z, Jia Z, Wang L, Tan T (2022) Focal and efficient IOU loss for accurate bounding box regression)

  17. Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2022) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transact Cybern 52(8):8574–8586. https://doi.org/10.1109/TCYB.2021.3095305

    Article  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 1406.4729

  19. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014), Generative adversarial networks

  20. Mirza M, Osindero S (2014), Conditional generative adversarial nets

  21. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN

  22. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks

  23. Razghandi M, Zhou H, Erol-Kantarci M, Turgut D (2022) Variational autoencoder generative adversarial network for synthetic data generation in smart home

  24. Prajapati K, Chudasama V, Patel H, Upla K, Ramachandra R, Raja K, Busch C Unsupervised single image super-resolution network (usisresnet) for real-world data using generative adversarial network. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp. 1904–1913 (2020). https://doi.org/10.1109/CVPRW50498.2020.00240

  25. Zhang K, Liang J, Gool LV, Timofte R (2021) Designing a practical degradation model for deep blind image super-resolution

  26. Han W, Zhang Z, Zhang Y, Yu J, Chiu C-C, Qin J, Gulati A, Pang R, Wu Y (2020) ContextNet: improving convolutional neural networks for automatic speech recognition with global context

  27. Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks

  28. Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434. https://doi.org/10.1109/tip.2019.2896952

    Article  MathSciNet  MATH  Google Scholar 

  29. Cui L, Ma R, Lv P, Jiang X, Gao Z, Zhou B, Xu M (2020) MDSSD: multi-scale deconvolutional single shot detector for small objects

  30. Sun K, Zhang J, Liu J, Yu R, Song Z (2021) Drcnn: dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Transact Image Process 30:868–877. https://doi.org/10.1109/TIP.2020.3039378

    Article  Google Scholar 

  31. Liu Z, Du J, Tian F (2019) Wen J Mr-cnn: a multi-scale region-based convolutional neural network for small traffic sign recognition. IEEE Access 7:57120–57128. https://doi.org/10.1109/ACCESS.2019.2913882

    Article  Google Scholar 

  32. Zhang G, Lu S, Zhang W (2019) CAD-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024. https://doi.org/10.1109/tgrs.2019.2930982

    Article  Google Scholar 

  33. Chen D, Miao D, Zhao X (2023) Hyneter: hybrid network transformer for object detection

  34. Ding J, Li W, Pei L, Yang M, Ye C (2023) Yuan B Sw-yolox: an anchor-free detector based transformer for sea surface object detection. Expert Syst Appl 217:119560. https://doi.org/10.1016/j.eswa.2023.119560

    Article  Google Scholar 

  35. Yang H, Yang Z, Hu A, Liu C, Cui TJ, Miao J (2023) Unifying convolution and transformer for efficient concealed object detection in passive millimeter-wave images. IEEE Trans Circuits Syst Video Technol 33(8):3872–3887. https://doi.org/10.1109/TCSVT.2023.3234311

    Article  Google Scholar 

  36. Yang C, Huang Z, Wang N (2022), Querydet: Cascaded sparse query for accelerating high-resolution small object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 13658–13667 . https://doi.org/10.1109/CVPR52688.2022.01330

  37. Sunkara R, Luo T (2022), No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects

  38. Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios

  39. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation

  40. Neubeck A, Gool LV (2006), Efficient non-maximum suppression. 18th international conference on pattern recognition (ICPR’06) 3, 850–855

  41. Chen J, Kao S-H, He H, Zhuo W, Wen S, Lee C-H, Chan S-HG (2023) Run. Chasing Higher FLOPS for Faster Neural Networks, Don’t Walk

  42. Hu J, Shen L, Albanie S, Sun G, Wu E (2019) Squeeze-and-excitation networks

  43. Woo S, Park J, Lee J (2018), Kweon IS CBAM: convolutional block attention module. CoRR abs/1807.06521 1807.06521

  44. Gu R, Wang G, Song T, Huang R, Aertsen M, Deprest J, Ourselin S, Vercauteren T, Zhang S (2021) CA-net: comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans Med Imaging 40(2):699–711. https://doi.org/10.1109/tmi.2020.3035253

    Article  Google Scholar 

  45. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-IoU loss: faster and better learning for bounding box regression

  46. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: more features from cheap operations

  47. Zhang X, Zhou X, Lin M, Sun J (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices

  48. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications

  49. Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y (2022), Lai B PP-YOLOE: an evolved version of YOLO

Download references

Acknowledgements

We would like to thank the laboratory of Capital Normal University for its equipment, teachers’ guidance, and students’ help.

Funding

In this research, we did not use any additional fund support and completely used the original funds of the laboratory.

Author information

Authors and Affiliations

Authors

Contributions

In this research, Zonghui Wen was responsible for proposal of innovation points, experimental design, implementation, and paper writing; Jia Su and Yongxiang Zhang are responsible for the paper writing guide; Mingyu Li, Guoxi Gan, Shenmeng Zhang, and Deyu Fan conducted part of the experiment.

Corresponding author

Correspondence to Jia Su.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical Approval

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, Z., Su, J., Zhang, Y. et al. A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios. Int J Multimed Info Retr 12, 38 (2023). https://doi.org/10.1007/s13735-023-00305-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13735-023-00305-5

Keywords

Navigation