Skip to main content
Log in

Modified YOLOv5 for small target detection in aerial images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Object detection is an important field in computer vision. Detecting objects in aerial images is an extremely challenging task as the objects can be very small compared to the size of the image, the objects can have any orientation, and depending upon the altitude, the same object can appear in different sizes. YOLOv5 is a recent object detection algorithm that has a good balance of accuracy and speed. This work focuses on enhancing the YOLOv5 object detection algorithm specifically for small target detection. The accuracy on small objects has been improved by adding a new feature fusion layer in the feature pyramid part of YOLOv5 and using compound scaling to increase the input size. The modified YOLOv5 demonstrates a remarkable 11% improvement in mAP 0.5 on the small vehicle class of the DOTA dataset while being 25% smaller in terms of GFLOPS and achieving a 10.52% faster inference time, making it well-suited for real-time applications. Furthermore, the modified YOLOv5 achieves a notable 45.2% mAP 0.5 compared to 31.7% mAP 0.5 of YOLOv5 on the challenging VisDrone dataset. The modified YOLOv5 outperforms many state-of-the-art algorithms in small target detection in aerial images. In addition to performance evaluation, we also present an analysis of object sizes in pixel areas in the VisDrone and DOTA datasets. The proposed modifications demonstrate the potential for significant advancements in small target detection in aerial images and provide valuable insights for further research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availibility Statement

The data required for reproducing the results in this work is available at https://github.com/inderpreet1390/YOLOv5-small-target.

References

  1. Jocher G et al ultralytics/yolov5: V5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations. https://doi.org/10.5281/zenodo.4679653

  2. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91

  3. Tan M, Le QV (2020) EfficientNet: rethinking model scaling for convolutional neural networks

  4. Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 3974–3983. https://doi.org/10.1109/CVPR.2018.00418

  5. Du D, Zhu P et al (2019) Visdrone-det2019: the vision meets drone object detection in image challenge results. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 213–226. https://doi.org/10.1109/ICCVW.2019.00030

  6. Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang MY, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2021) Object detection in aerial images: a large-scale benchmark and challenges

  7. Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384

    Article  Google Scholar 

  8. Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734. https://doi.org/10.1109/ACPR.2015.7486599

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  10. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

  11. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  12. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, Cham, pp 21–37

    Chapter  Google Scholar 

  13. Dai J, Li Y, He K, Sun J (2016) R–fcn: object detection via region–based fully convolutional networks. In: Lee DD, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds.) NIPS, pp 379–387. http://dblp.uni-trier.de/db/conf/nips/nips2016.html#DaiLHS16

  14. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079

  15. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635. https://doi.org/10.1109/ICCV.2019.00972

  16. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690

  17. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning. ICML’15, vol 37, pp 448–456. JMLR.org, ???

  18. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958

    MathSciNet  Google Scholar 

  19. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement

  20. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106

  21. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection

  22. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 740–755

    Chapter  Google Scholar 

  23. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  24. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 346–361

    Chapter  Google Scholar 

  25. Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656. https://doi.org/10.1007/s11263-019-01204-1

    Article  Google Scholar 

  26. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850

  27. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667

  28. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. arXiv:2005.12872

  29. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768. https://doi.org/10.1109/CVPR.2018.00913

  30. Wang C-Y, Mark Liao H-Y, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: a new backbone that can enhance learning capability of cnn. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203

  31. Ni J, Shen K, Chen Y, Yang SX (2023) An improved ssd-like deep network-based object detection method for indoor scenes. IEEE Trans Instrum Meas 72:1–15. https://doi.org/10.1109/TIM.2023.3244819

    Article  Google Scholar 

  32. Ni J, Shen K, Chen Y, Cao W, Yang SX (2022) An improved deep network-based scene classification method for self-driving cars. IEEE Trans Instrum Meas 71:1–14. https://doi.org/10.1109/TIM.2022.3146923

    Article  Google Scholar 

  33. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in neural information processing systems. Curran Associates, Inc., vol 28. ???. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf

  34. Razakarivony S, Jurie F (2015) Vehicle detection in aerial imagery: a small target detection benchmark. Journal of Visual Communication and Image Representation, Elsevier

  35. Mundhenk TN, Konjevod G, Sakla WA, Boakye K (2020) Cars overhead with context (COWC). UC San Diego Library Digital Collections, In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. https://doi.org/10.6075/J0CN72BC, http://library.ucsd.edu/dc/object/bb8332755d

  36. Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: object detection and tracking

  37. Hunter JD (2007) Matplotlib: a 2d graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55

    Article  Google Scholar 

  38. Clark A (2015) Pillow (PIL Fork) Documentation. readthedocs. https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf

  39. Umesh P (2012) Image processing in python. CSI Commun 23

  40. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook, pp 6000–6010

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inderpreet Singh.

Ethics declarations

Conflict of Interest Statement

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Nomenclature table

Appendix A Nomenclature table

Table 7 Nomenclature Table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, I., Munjal, G. Modified YOLOv5 for small target detection in aerial images. Multimed Tools Appl 83, 53221–53242 (2024). https://doi.org/10.1007/s11042-023-17625-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17625-7

Keywords

Navigation