Skip to main content
Log in

Small object detection model for UAV aerial image based on YOLOv7

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Unmanned Aerial Vehicle (UAV) aerial image target detection mainly faces the problems of small targets and target occlusion. In order to improve detection accuracy while maintaining efficiency, this work introduces a UAV aerial image small object detection model based on the real-time detector YOLOv7(SOD-YOLOv7). To address the challenge of small object detection, we have designed a module that combines Swin Transformer and convolution to better capture the global context information of small objects in the image. Additionally, we have introduced the Bi-Level Routing Attention (BRA) mechanism to enhance the model's focus on small objects. To improve the model's detection capabilities at multiple scales, we have added detection branches. For the issue of detecting occluded objects, we have incorporated a dynamic detection head with deformable convolution and attention mechanisms to enhance the model's spatial awareness of targets. The experimental results on the VisDrone and CARPK unmanned aerial vehicle image datasets show that the average precision (mAP@0.5) of our model reaches 53.2% and 98.5%, respectively. Compared to the original YOLOv7 method, our model achieves an improvement of 4.3% and 0.3%, demonstrating better performance in detecting small objects. The code will be soon released at https://github.com/Gentle-Hui/SOD-YOLOv7.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig.2
Fig.3
Fig.4
Fig.5
Fig.6

Similar content being viewed by others

Data availability

The code will be available soon at https://github.com/Gentle-Hui/SOD-YOLOv7.

References

  1. Lin, T., Maire, M., Belongie, J.S., et al.: Microsoft Coco: common objects in context. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 740–755 (2014)

  2. Du, D.W., Zhu P F, Wen L Y, et al.: VisDrone-DET2019: The vision meets drone object detection in image challenge results. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–36 (2019)

  3. Girshick, R., Donahue, J., Darrell, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.580–587 (2014)

  4. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

  5. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  PubMed  Google Scholar 

  6. Khan, S.D., Alarabi, L., Basalamah, S.: A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab. J. Sci. Eng. 47, 9489–9504 (2022)

    Article  Google Scholar 

  7. He, K., et al.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)

  8. Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6154–6162 (2018)

  9. Joseph, R., Santosh, K.D,, Ross, B.G., et al.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  10. Liu, W., Anguelov, D., Erhan, D, et al.: Ssd: single shot multibox detector. CoRR, arXiv:1512.02325 (2015)

  11. Glenn, J.: YOLOv5. https://github.com/ultralytics/yolov5 (2022)

  12. LI, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)

  13. Ge, Z., Liu, S., Wang, F., et al.: YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  14. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)

  15. Glenn, J.: YOLOv8. https://github.com/ultralytics/ultralytics (2023)

  16. Carion, N., Massa F., Synnaeve G, et al.: End-to-end object detection with transformers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 213–229 (2020)

  17. Hassani, A., Shi, H. Dilated neighborhood attention transformer. arXiv preprint arXiv:2209.15001 (2022)

  18. Liu, Z., Mao, H., Wu, C., et al.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986 (2022)

  19. Wang, W., Dai, J., Chen, Z., et al.: Internimage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14408–14419 (2023)

  20. Ding, X., Zhang, X., Han, J., et al.: Scaling up your kernels to 31x31: Revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11963–11975 (2022)

  21. Hong, M., Li, S., Yang, Y., et al.: SSPNet: scale selection pyramid network for tiny person detection from UAV images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)

    Article  Google Scholar 

  22. Yu, L., Wu, H., Zhong, Z., et al.: TWC-Net: a SAR ship detection using two-way convolution and multiscale feature mapping. Remote Sens. 13(13), 2558 (2021)

    Article  ADS  Google Scholar 

  23. Chen, Y., Zhu, X., Li, Y., et al.: Enhanced semantic feature pyramid network for small object detection. Signal Process. Image Commun. 113, 116919 (2023)

    Article  Google Scholar 

  24. Ren, Y., Zhu, C., Xiao, S.: Deformable Faster R-CNN with aggregating multi-layer features for partially occluded object detection in optical remote sensing images. Remote Sens. 10(9), 1470 (2018)

    Article  ADS  Google Scholar 

  25. Sun, K., Wen, Q., Zhou, H.: Ganster R-CNN: occluded object detection network based on generative adversarial nets and faster R-CNN. IEEE Access 10, 105022–105030 (2022)

    Article  Google Scholar 

  26. Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision Transformer using shifted windows. In: Proceedings of the 2021 IEEE International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)

  27. Zhu, L., Wang, X., Ke, Z., et al.: BiFormer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10323–10333 (2023)

  28. Ding, X., Zhang, X., Ma, N., et al.: RepVGG: Making VGG-style ConvNets great again. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13733–13742 (2021)

  29. Liu, S., Qi L, Qin H, et al.: Path aggregation network for instance segmentation. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)

  30. Li, Z., Yan, J., Zhou, J., et al.: An efficient SMD-PCBA detection based on YOLOv7 network model. Eng. Appl. Artif. Intell. 124, 106492 (2023)

    Article  Google Scholar 

  31. Chen, X., Yuan, M., Yang, Q., et al.: Underwater-YCC: underwater object detection optimization algorithm based on YOLOv7. J. Mar. Sci. Eng. 11(5), 995 (2023)

    Article  Google Scholar 

  32. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Neural Inf. Process. Syst. 30, 6000–6010 (2017)

    Google Scholar 

  33. Tang, F., Yang, F., Tian, X.: Long-distance person detection based on YOLOv7. Electronics 12(6), 1502 (2023)

    Article  Google Scholar 

  34. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  35. Xia, Z., Pan, X., Song, S., et al.: Vision transformer with deformable attention. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4794–4803 (2022)

  36. Tang, S., Zhang, J., Zhu, S., et al.: Quadtree attention for vision transformers. arXiv preprint arXiv:2201.02767 (2022)

  37. Rao, Y., Zhao, W., Liu, B., et al.: Dynamicvit: efficient vision transformers with dynamic token sparsification. Neural Inf. Process. Syst. 34, 13937–13949 (2021)

    Google Scholar 

  38. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  39. Zhu, W., Zhang, H., Zhang, C., et al.: Surface defect detection and classification of steel using an efficient Swin Transformer. Adv. Eng. Inform. 57, 10206 (2023)

    Article  Google Scholar 

  40. Teng, Y., Liu, S., Sun, W., et al.: A VHR bi-temporal remote-sensing image change detection network based on Swin Transformer. Remote Sens. 15(10), 264 (2023)

    Article  Google Scholar 

  41. Lin, T., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

  42. Dai, X., Chen, Y., Xiao, B., et al.: Dynamic head: unifying object detection heads with attentions. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7373–7382 (2021)

  43. Hsieh, M.R., Lin, Y., Hsu, W.H.: Drone-based object counting by spatially regularized regional proposal network. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4145–4153 (2017)

Download references

Funding

This work was supported by the Natural Science Basic Research Program of Shaanxi under Grant 2023-JC-YB-826, the Scientific Research Program Funded by Shaanxi Provincial Education Department under Grant 22JP028, and the Joint Foundation of Shaanxi Computer Society & Xi'an Xiangteng Microelectronics Technology Co., Ltd. under Grant XT-QC-202309-119287.

Author information

Authors and Affiliations

Authors

Contributions

Each author's contribution is the same.

Corresponding author

Correspondence to Jinguang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

The authors have no competing interests to declare that are relevant to the content of this article.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Wen, R. & Ma, L. Small object detection model for UAV aerial image based on YOLOv7. SIViP 18, 2695–2707 (2024). https://doi.org/10.1007/s11760-023-02941-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02941-0

Keywords

Navigation