Small object detection model for UAV aerial image based on YOLOv7

Chen, Jinguang; Wen, Ronghui; Ma, Lili

doi:10.1007/s11760-023-02941-0

Small object detection model for UAV aerial image based on YOLOv7

Original Paper
Published: 29 December 2023

Volume 18, pages 2695–2707, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jinguang Chen¹,
Ronghui Wen¹ &
Lili Ma¹

572 Accesses
Explore all metrics

Abstract

Unmanned Aerial Vehicle (UAV) aerial image target detection mainly faces the problems of small targets and target occlusion. In order to improve detection accuracy while maintaining efficiency, this work introduces a UAV aerial image small object detection model based on the real-time detector YOLOv7(SOD-YOLOv7). To address the challenge of small object detection, we have designed a module that combines Swin Transformer and convolution to better capture the global context information of small objects in the image. Additionally, we have introduced the Bi-Level Routing Attention (BRA) mechanism to enhance the model's focus on small objects. To improve the model's detection capabilities at multiple scales, we have added detection branches. For the issue of detecting occluded objects, we have incorporated a dynamic detection head with deformable convolution and attention mechanisms to enhance the model's spatial awareness of targets. The experimental results on the VisDrone and CARPK unmanned aerial vehicle image datasets show that the average precision (mAP@0.5) of our model reaches 53.2% and 98.5%, respectively. Compared to the original YOLOv7 method, our model achieves an improvement of 4.3% and 0.3%, demonstrating better performance in detecting small objects. The code will be soon released at https://github.com/Gentle-Hui/SOD-YOLOv7.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig.3

SCA-YOLO: a new small object detection model for UAV images

Article 25 May 2023

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Article 28 September 2023

Improved YOLOv7 Small Object Detection Algorithm for Seaside Aerial Images

Data availability

The code will be available soon at https://github.com/Gentle-Hui/SOD-YOLOv7.

References

Lin, T., Maire, M., Belongie, J.S., et al.: Microsoft Coco: common objects in context. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 740–755 (2014)
Du, D.W., Zhu P F, Wen L Y, et al.: VisDrone-DET2019: The vision meets drone object detection in image challenge results. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–36 (2019)
Girshick, R., Donahue, J., Darrell, et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.580–587 (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article PubMed Google Scholar
Khan, S.D., Alarabi, L., Basalamah, S.: A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab. J. Sci. Eng. 47, 9489–9504 (2022)
Article Google Scholar
He, K., et al.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6154–6162 (2018)
Joseph, R., Santosh, K.D,, Ross, B.G., et al.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D, et al.: Ssd: single shot multibox detector. CoRR, arXiv:1512.02325 (2015)
Glenn, J.: YOLOv5. https://github.com/ultralytics/yolov5 (2022)
LI, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Ge, Z., Liu, S., Wang, F., et al.: YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)
Glenn, J.: YOLOv8. https://github.com/ultralytics/ultralytics (2023)
Carion, N., Massa F., Synnaeve G, et al.: End-to-end object detection with transformers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 213–229 (2020)
Hassani, A., Shi, H. Dilated neighborhood attention transformer. arXiv preprint arXiv:2209.15001 (2022)
Liu, Z., Mao, H., Wu, C., et al.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986 (2022)
Wang, W., Dai, J., Chen, Z., et al.: Internimage: exploring large-scale vision foundation models with deformable convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14408–14419 (2023)
Ding, X., Zhang, X., Han, J., et al.: Scaling up your kernels to 31x31: Revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11963–11975 (2022)
Hong, M., Li, S., Yang, Y., et al.: SSPNet: scale selection pyramid network for tiny person detection from UAV images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022)
Article Google Scholar
Yu, L., Wu, H., Zhong, Z., et al.: TWC-Net: a SAR ship detection using two-way convolution and multiscale feature mapping. Remote Sens. 13(13), 2558 (2021)
Article ADS Google Scholar
Chen, Y., Zhu, X., Li, Y., et al.: Enhanced semantic feature pyramid network for small object detection. Signal Process. Image Commun. 113, 116919 (2023)
Article Google Scholar
Ren, Y., Zhu, C., Xiao, S.: Deformable Faster R-CNN with aggregating multi-layer features for partially occluded object detection in optical remote sensing images. Remote Sens. 10(9), 1470 (2018)
Article ADS Google Scholar
Sun, K., Wen, Q., Zhou, H.: Ganster R-CNN: occluded object detection network based on generative adversarial nets and faster R-CNN. IEEE Access 10, 105022–105030 (2022)
Article Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision Transformer using shifted windows. In: Proceedings of the 2021 IEEE International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Zhu, L., Wang, X., Ke, Z., et al.: BiFormer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10323–10333 (2023)
Ding, X., Zhang, X., Ma, N., et al.: RepVGG: Making VGG-style ConvNets great again. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13733–13742 (2021)
Liu, S., Qi L, Qin H, et al.: Path aggregation network for instance segmentation. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)
Li, Z., Yan, J., Zhou, J., et al.: An efficient SMD-PCBA detection based on YOLOv7 network model. Eng. Appl. Artif. Intell. 124, 106492 (2023)
Article Google Scholar
Chen, X., Yuan, M., Yang, Q., et al.: Underwater-YCC: underwater object detection optimization algorithm based on YOLOv7. J. Mar. Sci. Eng. 11(5), 995 (2023)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Neural Inf. Process. Syst. 30, 6000–6010 (2017)
Google Scholar
Tang, F., Yang, F., Tian, X.: Long-distance person detection based on YOLOv7. Electronics 12(6), 1502 (2023)
Article Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Xia, Z., Pan, X., Song, S., et al.: Vision transformer with deformable attention. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4794–4803 (2022)
Tang, S., Zhang, J., Zhu, S., et al.: Quadtree attention for vision transformers. arXiv preprint arXiv:2201.02767 (2022)
Rao, Y., Zhao, W., Liu, B., et al.: Dynamicvit: efficient vision transformers with dynamic token sparsification. Neural Inf. Process. Syst. 34, 13937–13949 (2021)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zhu, W., Zhang, H., Zhang, C., et al.: Surface defect detection and classification of steel using an efficient Swin Transformer. Adv. Eng. Inform. 57, 10206 (2023)
Article Google Scholar
Teng, Y., Liu, S., Sun, W., et al.: A VHR bi-temporal remote-sensing image change detection network based on Swin Transformer. Remote Sens. 15(10), 264 (2023)
Article Google Scholar
Lin, T., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Dai, X., Chen, Y., Xiao, B., et al.: Dynamic head: unifying object detection heads with attentions. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7373–7382 (2021)
Hsieh, M.R., Lin, Y., Hsu, W.H.: Drone-based object counting by spatially regularized regional proposal network. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4145–4153 (2017)

Download references

Funding

This work was supported by the Natural Science Basic Research Program of Shaanxi under Grant 2023-JC-YB-826, the Scientific Research Program Funded by Shaanxi Provincial Education Department under Grant 22JP028, and the Joint Foundation of Shaanxi Computer Society & Xi'an Xiangteng Microelectronics Technology Co., Ltd. under Grant XT-QC-202309-119287.

Author information

Authors and Affiliations

The Shaanxi Key Laboratory of Clothing Intelligence, School of Computer Science, Xi’an Polytechnic University, Xi’an, 710048, China
Jinguang Chen, Ronghui Wen & Lili Ma

Authors

Jinguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ronghui Wen
View author publications
You can also search for this author in PubMed Google Scholar
Lili Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Each author's contribution is the same.

Corresponding author

Correspondence to Jinguang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

The authors have no competing interests to declare that are relevant to the content of this article.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, J., Wen, R. & Ma, L. Small object detection model for UAV aerial image based on YOLOv7. SIViP 18, 2695–2707 (2024). https://doi.org/10.1007/s11760-023-02941-0

Download citation

Received: 07 October 2023
Revised: 15 November 2023
Accepted: 05 December 2023
Published: 29 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11760-023-02941-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Small object detection model for UAV aerial image based on YOLOv7

Abstract

Access this article

Similar content being viewed by others

SCA-YOLO: a new small object detection model for UAV images

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Improved YOLOv7 Small Object Detection Algorithm for Seaside Aerial Images

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Small object detection model for UAV aerial image based on YOLOv7

Abstract

Access this article

Similar content being viewed by others

SCA-YOLO: a new small object detection model for UAV images

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Improved YOLOv7 Small Object Detection Algorithm for Seaside Aerial Images

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation