Abstract
Small object detection has been a longstanding challenge in the field of object detection, and achieving high detection accuracy is crucial for autonomous driving, especially for small objects. This article focuses on researching small object detection algorithms in driving scenarios. To address the need for higher accuracy and fewer parameters in object detection for autonomous driving, we propose LSD-YOLO, a small object detection algorithm with higher average precision and fewer parameters. Building upon YOLOv5, we fully leverage small-scale feature maps to enhance the network’s detection ability for small objects. Additionally, we introduce a new structure called FasterC3 to reduce the network’s latency and parameter volume. To locate attention regions in complex driving scenarios, we integrate Coordinate Attention and explore multiple solutions to determine the optimal approach. Furthermore, we use a spatial pyramid pooling method called LeakySPPF (Wen and Zhang, in: Jin Z, Jiang Y, Buchmann RA, Bi Y, Ghiran A-M, Ma W (eds.) Knowledge Science, Engineering and Management, pp. 39-46. Springer, Cham, 2023) to further improve network speed, achieving up to 15% faster computation. Finally, to better match driving scenarios, we propose a medium-sized dataset called Cone4k to supplement insufficient categories in the VisDrone dataset. Extensive experiments show that our proposed LSD-YOLO(s) achieves an mAP and F1 score of 24.9 and 48.6, respectively, on the VisDrone2021 dataset, resulting in a 4.6% and 3.6% improvement over YOLOv5(s) while reducing parameter volume by 7.5%.
Similar content being viewed by others
Availability of supporting data
All the data in this paper are obtained by our experiments and are true and effective.
References
Wen Z, Su J, Zhang Y (2023) Sie-yolov5: improved yolov5 for small object detection in drone-captured-scenarios. In: Jin Z, Jiang Y, Buchmann RA, Bi Y, Ghiran A-M, Ma W (eds) Knowledge science, engineering and management. Springer, Cham, pp 39–46
Lin T, Maire M, Belongie SJ, Bourdev LD, Girshick RB, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. CoRR abs/1405.0312 1405.0312
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303–338
Chen C, Liu M-Y, Tuzel O, Xiao J (2017) R-cnn for small object detection. In: Lai S-H, Lepetit V, Nishino K, Sato Y (eds) Computer vision - ACCV 2016. Springer, Cham, pp 214–230
Lin T, Goyal P, Girshick RB, He K, Dollár P (2017) Focal loss for dense object detection. CoRR abs/1708.02002 1708.02002
Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. CoRR abs/1911.09070 1911.09070
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017), Attention is all you need
Redmon J, Divvala SK, Girshick RB, Farhadi A (2015), You only look once: unified, real-time object detection. CoRR abs/1506.02640 1506.02640
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. CoRR abs/1612.08242 1612.08242
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. CoRR abs/1804.02767 1804.02767
Bochkovskiy A, Wang C, Liao HM (2020) Yolov4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 2004.10934
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, Li Y, Zhang B, Liang Y, Zhou, L. Xu, X, Chu X, Wei X, Wei X (2022), YOLOv6: a single-stage object detection framework for industrial applications. https://doi.org/10.48550/ARXIV.2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv . https://doi.org/10.48550/ARXIV.2207.02696
Wang C-Y, Liao H-YM, Yeh I-H, Wu Y-H, Chen P-Y, Hsieh J-W (2019) CSPNet: a new backbone that can enhance learning capability of CNN
Zhang Y-F, Ren W, Zhang Z, Jia Z, Wang L, Tan T (2022) Focal and efficient IOU loss for accurate bounding box regression)
Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W (2022) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transact Cybern 52(8):8574–8586. https://doi.org/10.1109/TCYB.2021.3095305
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 1406.4729
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014), Generative adversarial networks
Mirza M, Osindero S (2014), Conditional generative adversarial nets
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks
Razghandi M, Zhou H, Erol-Kantarci M, Turgut D (2022) Variational autoencoder generative adversarial network for synthetic data generation in smart home
Prajapati K, Chudasama V, Patel H, Upla K, Ramachandra R, Raja K, Busch C Unsupervised single image super-resolution network (usisresnet) for real-world data using generative adversarial network. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp. 1904–1913 (2020). https://doi.org/10.1109/CVPRW50498.2020.00240
Zhang K, Liang J, Gool LV, Timofte R (2021) Designing a practical degradation model for deep blind image super-resolution
Han W, Zhang Z, Zhang Y, Yu J, Chiu C-C, Qin J, Gulati A, Pang R, Wu Y (2020) ContextNet: improving convolutional neural networks for automatic speech recognition with global context
Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks
Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434. https://doi.org/10.1109/tip.2019.2896952
Cui L, Ma R, Lv P, Jiang X, Gao Z, Zhou B, Xu M (2020) MDSSD: multi-scale deconvolutional single shot detector for small objects
Sun K, Zhang J, Liu J, Yu R, Song Z (2021) Drcnn: dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Transact Image Process 30:868–877. https://doi.org/10.1109/TIP.2020.3039378
Liu Z, Du J, Tian F (2019) Wen J Mr-cnn: a multi-scale region-based convolutional neural network for small traffic sign recognition. IEEE Access 7:57120–57128. https://doi.org/10.1109/ACCESS.2019.2913882
Zhang G, Lu S, Zhang W (2019) CAD-net: a context-aware detection network for objects in remote sensing imagery. IEEE Trans Geosci Remote Sens 57(12):10015–10024. https://doi.org/10.1109/tgrs.2019.2930982
Chen D, Miao D, Zhao X (2023) Hyneter: hybrid network transformer for object detection
Ding J, Li W, Pei L, Yang M, Ye C (2023) Yuan B Sw-yolox: an anchor-free detector based transformer for sea surface object detection. Expert Syst Appl 217:119560. https://doi.org/10.1016/j.eswa.2023.119560
Yang H, Yang Z, Hu A, Liu C, Cui TJ, Miao J (2023) Unifying convolution and transformer for efficient concealed object detection in passive millimeter-wave images. IEEE Trans Circuits Syst Video Technol 33(8):3872–3887. https://doi.org/10.1109/TCSVT.2023.3234311
Yang C, Huang Z, Wang N (2022), Querydet: Cascaded sparse query for accelerating high-resolution small object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 13658–13667 . https://doi.org/10.1109/CVPR52688.2022.01330
Sunkara R, Luo T (2022), No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects
Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation
Neubeck A, Gool LV (2006), Efficient non-maximum suppression. 18th international conference on pattern recognition (ICPR’06) 3, 850–855
Chen J, Kao S-H, He H, Zhuo W, Wen S, Lee C-H, Chan S-HG (2023) Run. Chasing Higher FLOPS for Faster Neural Networks, Don’t Walk
Hu J, Shen L, Albanie S, Sun G, Wu E (2019) Squeeze-and-excitation networks
Woo S, Park J, Lee J (2018), Kweon IS CBAM: convolutional block attention module. CoRR abs/1807.06521 1807.06521
Gu R, Wang G, Song T, Huang R, Aertsen M, Deprest J, Ourselin S, Vercauteren T, Zhang S (2021) CA-net: comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans Med Imaging 40(2):699–711. https://doi.org/10.1109/tmi.2020.3035253
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-IoU loss: faster and better learning for bounding box regression
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: more features from cheap operations
Zhang X, Zhou X, Lin M, Sun J (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications
Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y (2022), Lai B PP-YOLOE: an evolved version of YOLO
Acknowledgements
We would like to thank the laboratory of Capital Normal University for its equipment, teachers’ guidance, and students’ help.
Funding
In this research, we did not use any additional fund support and completely used the original funds of the laboratory.
Author information
Authors and Affiliations
Contributions
In this research, Zonghui Wen was responsible for proposal of innovation points, experimental design, implementation, and paper writing; Jia Su and Yongxiang Zhang are responsible for the paper writing guide; Mingyu Li, Guoxi Gan, Shenmeng Zhang, and Deyu Fan conducted part of the experiment.
Corresponding author
Ethics declarations
Conflict of interest
We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical Approval
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wen, Z., Su, J., Zhang, Y. et al. A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios. Int J Multimed Info Retr 12, 38 (2023). https://doi.org/10.1007/s13735-023-00305-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13735-023-00305-5