Modified YOLOv5 for small target detection in aerial images

Singh, Inderpreet; Munjal, Geetika

doi:10.1007/s11042-023-17625-7

Modified YOLOv5 for small target detection in aerial images

Published: 16 November 2023

Volume 83, pages 53221–53242, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

440 Accesses
Explore all metrics

Abstract

Object detection is an important field in computer vision. Detecting objects in aerial images is an extremely challenging task as the objects can be very small compared to the size of the image, the objects can have any orientation, and depending upon the altitude, the same object can appear in different sizes. YOLOv5 is a recent object detection algorithm that has a good balance of accuracy and speed. This work focuses on enhancing the YOLOv5 object detection algorithm specifically for small target detection. The accuracy on small objects has been improved by adding a new feature fusion layer in the feature pyramid part of YOLOv5 and using compound scaling to increase the input size. The modified YOLOv5 demonstrates a remarkable 11% improvement in mAP 0.5 on the small vehicle class of the DOTA dataset while being 25% smaller in terms of GFLOPS and achieving a 10.52% faster inference time, making it well-suited for real-time applications. Furthermore, the modified YOLOv5 achieves a notable 45.2% mAP 0.5 compared to 31.7% mAP 0.5 of YOLOv5 on the challenging VisDrone dataset. The modified YOLOv5 outperforms many state-of-the-art algorithms in small target detection in aerial images. In addition to performance evaluation, we also present an analysis of object sizes in pixel areas in the VisDrone and DOTA datasets. The proposed modifications demonstrate the potential for significant advancements in small target detection in aerial images and provide valuable insights for further research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Microsoft COCO: Common Objects in Context

Data Availibility Statement

The data required for reproducing the results in this work is available at https://github.com/inderpreet1390/YOLOv5-small-target.

References

Jocher G et al ultralytics/yolov5: V5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations. https://doi.org/10.5281/zenodo.4679653
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Tan M, Le QV (2020) EfficientNet: rethinking model scaling for convolutional neural networks
Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 3974–3983. https://doi.org/10.1109/CVPR.2018.00418
Du D, Zhu P et al (2019) Visdrone-det2019: the vision meets drone object detection in image challenge results. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 213–226. https://doi.org/10.1109/ICCVW.2019.00030
Ding J, Xue N, Xia G-S, Bai X, Yang W, Yang MY, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2021) Object detection in aerial images: a large-scale benchmark and challenges
Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384
Article Google Scholar
Liu S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 730–734. https://doi.org/10.1109/ACPR.2015.7486599
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, Cham, pp 21–37
Chapter Google Scholar
Dai J, Li Y, He K, Sun J (2016) R–fcn: object detection via region–based fully convolutional networks. In: Lee DD, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds.) NIPS, pp 379–387. http://dblp.uni-trier.de/db/conf/nips/nips2016.html#DaiLHS16
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning. ICML’15, vol 37, pp 448–456. JMLR.org, ???
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958
MathSciNet Google Scholar
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 740–755
Chapter Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 346–361
Chapter Google Scholar
Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656. https://doi.org/10.1007/s11263-019-01204-1
Article Google Scholar
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. arXiv:2005.12872
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Wang C-Y, Mark Liao H-Y, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: a new backbone that can enhance learning capability of cnn. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203
Ni J, Shen K, Chen Y, Yang SX (2023) An improved ssd-like deep network-based object detection method for indoor scenes. IEEE Trans Instrum Meas 72:1–15. https://doi.org/10.1109/TIM.2023.3244819
Article Google Scholar
Ni J, Shen K, Chen Y, Cao W, Yang SX (2022) An improved deep network-based scene classification method for self-driving cars. IEEE Trans Instrum Meas 71:1–14. https://doi.org/10.1109/TIM.2022.3146923
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in neural information processing systems. Curran Associates, Inc., vol 28. ???. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
Razakarivony S, Jurie F (2015) Vehicle detection in aerial imagery: a small target detection benchmark. Journal of Visual Communication and Image Representation, Elsevier
Mundhenk TN, Konjevod G, Sakla WA, Boakye K (2020) Cars overhead with context (COWC). UC San Diego Library Digital Collections, In Lawrence Livermore National Laboratory (LLNL) Open Data Initiative. https://doi.org/10.6075/J0CN72BC, http://library.ucsd.edu/dc/object/bb8332755d
Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: object detection and tracking
Hunter JD (2007) Matplotlib: a 2d graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55
Article Google Scholar
Clark A (2015) Pillow (PIL Fork) Documentation. readthedocs. https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf
Umesh P (2012) Image processing in python. CSI Commun 23
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook, pp 6000–6010

Download references

Author information

Authors and Affiliations

Amity School of Engineering and Technology, Amity University, Noida, 201303, U.P., India
Inderpreet Singh & Geetika Munjal

Authors

Inderpreet Singh
View author publications
You can also search for this author in PubMed Google Scholar
Geetika Munjal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inderpreet Singh.

Ethics declarations

Conflict of Interest Statement

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Nomenclature table

Table 7 Nomenclature Table

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, I., Munjal, G. Modified YOLOv5 for small target detection in aerial images. Multimed Tools Appl 83, 53221–53242 (2024). https://doi.org/10.1007/s11042-023-17625-7

Download citation

Received: 17 April 2023
Revised: 05 October 2023
Accepted: 25 October 2023
Published: 16 November 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17625-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modified YOLOv5 for small target detection in aerial images

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

Data Availibility Statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest Statement

Additional information

Publisher's Note

Appendix A Nomenclature table

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modified YOLOv5 for small target detection in aerial images

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Microsoft COCO: Common Objects in Context

Data Availibility Statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest Statement

Additional information

Publisher's Note

Appendix A Nomenclature table

Appendix A Nomenclature table

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation