A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

Wen, Zonghui; Su, Jia; Zhang, Yongxiang; Li, Mingyu; Gan, Guoxi; Zhang, Shenmeng; Fan, Deyu

doi:10.1007/s13735-023-00305-5

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

Regular Paper
Published: 14 November 2023

Volume 12, article number 38, (2023)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Zonghui Wen¹,
Jia Su¹,
Yongxiang Zhang¹,
Mingyu Li²,
Guoxi Gan¹,
Shenmeng Zhang¹ &
…
Deyu Fan³

423 Accesses
1 Citation
Explore all metrics

Abstract

Small object detection has been a longstanding challenge in the field of object detection, and achieving high detection accuracy is crucial for autonomous driving, especially for small objects. This article focuses on researching small object detection algorithms in driving scenarios. To address the need for higher accuracy and fewer parameters in object detection for autonomous driving, we propose LSD-YOLO, a small object detection algorithm with higher average precision and fewer parameters. Building upon YOLOv5, we fully leverage small-scale feature maps to enhance the network’s detection ability for small objects. Additionally, we introduce a new structure called FasterC3 to reduce the network’s latency and parameter volume. To locate attention regions in complex driving scenarios, we integrate Coordinate Attention and explore multiple solutions to determine the optimal approach. Furthermore, we use a spatial pyramid pooling method called LeakySPPF (Wen and Zhang, in: Jin Z, Jiang Y, Buchmann RA, Bi Y, Ghiran A-M, Ma W (eds.) Knowledge Science, Engineering and Management, pp. 39-46. Springer, Cham, 2023) to further improve network speed, achieving up to 15% faster computation. Finally, to better match driving scenarios, we propose a medium-sized dataset called Cone4k to supplement insufficient categories in the VisDrone dataset. Extensive experiments show that our proposed LSD-YOLO(s) achieves an mAP and F1 score of 24.9 and 48.6, respectively, on the VisDrone2021 dataset, resulting in a 4.6% and 3.6% improvement over YOLOv5(s) while reducing parameter volume by 7.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 8

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Availability of supporting data

All the data in this paper are obtained by our experiments and are true and effective.

References

Wen Z, Su J, Zhang Y (2023) Sie-yolov5: improved yolov5 for small object detection in drone-captured-scenarios. In: Jin Z, Jiang Y, Buchmann RA, Bi Y, Ghiran A-M, Ma W (eds) Knowledge science, engineering and management. Springer, Cham, pp 39–46
Chapter Google Scholar
Lin T, Maire M, Belongie SJ, Bourdev LD, Girshick RB, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. CoRR abs/1405.0312 1405.0312
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303–338
Article Google Scholar
Chen C, Liu M-Y, Tuzel O, Xiao J (2017) R-cnn for small object detection. In: Lai S-H, Lepetit V, Nishino K, Sato Y (eds) Computer vision - ACCV 2016. Springer, Cham, pp 214–230
Chapter Google Scholar
Lin T, Goyal P, Girshick RB, He K, Dollár P (2017) Focal loss for dense object detection. CoRR abs/1708.02002 1708.02002
Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. CoRR abs/1911.09070 1911.09070
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017), Attention is all you need
Redmon J, Divvala SK, Girshick RB, Farhadi A (2015), You only look once: unified, real-time object detection. CoRR abs/1506.02640 1506.02640
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. CoRR abs/1612.08242 1612.08242
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. CoRR abs/1804.02767 1804.02767
Bochkovskiy A, Wang C, Liao HM (2020) Yolov4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 2004.10934
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, Li Y, Zhang B, Liang Y, Zhou, L. Xu, X, Chu X, Wei X, Wei X (2022), YOLOv6: a single-stage object detection framework for industrial applications. https://doi.org/10.48550/ARXIV.2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv . https://doi.org/10.48550/ARXIV.2207.02696
Wang C-Y, Liao H-YM, Yeh I-H, Wu Y-H, Chen P-Y, Hsieh J-W (2019) CSPNet: a new backbone that can enhance learning capability of CNN
Zhang Y-F, Ren W, Zhang Z, Jia Z, Wang L, Tan T (2022) Focal and efficient IOU loss for accurate bounding box regression)
Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2022) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transact Cybern 52(8):8574–8586. https://doi.org/10.1109/TCYB.2021.3095305
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 1406.4729
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014), Generative adversarial networks
Mirza M, Osindero S (2014), Conditional generative adversarial nets
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks
Razghandi M, Zhou H, Erol-Kantarci M, Turgut D (2022) Variational autoencoder generative adversarial network for synthetic data generation in smart home
Prajapati K, Chudasama V, Patel H, Upla K, Ramachandra R, Raja K, Busch C Unsupervised single image super-resolution network (usisresnet) for real-world data using generative adversarial network. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp. 1904–1913 (2020). https://doi.org/10.1109/CVPRW50498.2020.00240
Zhang K, Liang J, Gool LV, Timofte R (2021) Designing a practical degradation model for deep blind image super-resolution
Han W, Zhang Z, Zhang Y, Yu J, Chiu C-C, Qin J, Gulati A, Pang R, Wu Y (2020) ContextNet: improving convolutional neural networks for automatic speech recognition with global context
Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks
Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434. https://doi.org/10.1109/tip.2019.2896952
Article MathSciNet MATH Google Scholar
Cui L, Ma R, Lv P, Jiang X, Gao Z, Zhou B, Xu M (2020) MDSSD: multi-scale deconvolutional single shot detector for small objects
Sun K, Zhang J, Liu J, Yu R, Song Z (2021) Drcnn: dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Transact Image Process 30:868–877. https://doi.org/10.1109/TIP.2020.3039378
Article Google Scholar
Liu Z, Du J, Tian F (2019) Wen J Mr-cnn: a multi-scale region-based convolutional neural network for small traffic sign recognition. IEEE Access 7:57120–57128. https://doi.org/10.1109/ACCESS.2019.2913882
Article Google Scholar
Zhang G, Lu S, Zhang W (2019) CAD-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024. https://doi.org/10.1109/tgrs.2019.2930982
Article Google Scholar
Chen D, Miao D, Zhao X (2023) Hyneter: hybrid network transformer for object detection
Ding J, Li W, Pei L, Yang M, Ye C (2023) Yuan B Sw-yolox: an anchor-free detector based transformer for sea surface object detection. Expert Syst Appl 217:119560. https://doi.org/10.1016/j.eswa.2023.119560
Article Google Scholar
Yang H, Yang Z, Hu A, Liu C, Cui TJ, Miao J (2023) Unifying convolution and transformer for efficient concealed object detection in passive millimeter-wave images. IEEE Trans Circuits Syst Video Technol 33(8):3872–3887. https://doi.org/10.1109/TCSVT.2023.3234311
Article Google Scholar
Yang C, Huang Z, Wang N (2022), Querydet: Cascaded sparse query for accelerating high-resolution small object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 13658–13667 . https://doi.org/10.1109/CVPR52688.2022.01330
Sunkara R, Luo T (2022), No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects
Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation
Neubeck A, Gool LV (2006), Efficient non-maximum suppression. 18th international conference on pattern recognition (ICPR’06) 3, 850–855
Chen J, Kao S-H, He H, Zhuo W, Wen S, Lee C-H, Chan S-HG (2023) Run. Chasing Higher FLOPS for Faster Neural Networks, Don’t Walk
Hu J, Shen L, Albanie S, Sun G, Wu E (2019) Squeeze-and-excitation networks
Woo S, Park J, Lee J (2018), Kweon IS CBAM: convolutional block attention module. CoRR abs/1807.06521 1807.06521
Gu R, Wang G, Song T, Huang R, Aertsen M, Deprest J, Ourselin S, Vercauteren T, Zhang S (2021) CA-net: comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans Med Imaging 40(2):699–711. https://doi.org/10.1109/tmi.2020.3035253
Article Google Scholar
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-IoU loss: faster and better learning for bounding box regression
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: more features from cheap operations
Zhang X, Zhou X, Lin M, Sun J (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications
Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y (2022), Lai B PP-YOLOE: an evolved version of YOLO

Download references

Acknowledgements

We would like to thank the laboratory of Capital Normal University for its equipment, teachers’ guidance, and students’ help.

Funding

In this research, we did not use any additional fund support and completely used the original funds of the laboratory.

Author information

Authors and Affiliations

Information Engineering College, Capital Normal University, Beijing, China
Zonghui Wen, Jia Su, Yongxiang Zhang, Guoxi Gan & Shenmeng Zhang
Nanyang Technological University, Singapore, Singapore
Mingyu Li
Qingdao University of Science and Technology, Qingdao, China
Deyu Fan

Authors

Zonghui Wen
View author publications
You can also search for this author in PubMed Google Scholar
Jia Su
View author publications
You can also search for this author in PubMed Google Scholar
Yongxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Guoxi Gan
View author publications
You can also search for this author in PubMed Google Scholar
Shenmeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Deyu Fan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

In this research, Zonghui Wen was responsible for proposal of innovation points, experimental design, implementation, and paper writing; Jia Su and Yongxiang Zhang are responsible for the paper writing guide; Mingyu Li, Guoxi Gan, Shenmeng Zhang, and Deyu Fan conducted part of the experiment.

Corresponding author

Correspondence to Jia Su.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical Approval

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wen, Z., Su, J., Zhang, Y. et al. A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios. Int J Multimed Info Retr 12, 38 (2023). https://doi.org/10.1007/s13735-023-00305-5

Download citation

Received: 19 July 2023
Revised: 16 October 2023
Accepted: 19 October 2023
Published: 14 November 2023
DOI: https://doi.org/10.1007/s13735-023-00305-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Availability of supporting data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Availability of supporting data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation