Skip to main content
Log in

YOLO series algorithms in object detection of unmanned aerial vehicles: a survey

  • Special Issue Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

YOLO series algorithms are widely used in unmanned aerial vehicles (UAV) object detection scenarios due to their fast and lightweight properties. This article summarizes the key concepts in YOLO series algorithms, such as the anchor mechanism, feature fusion strategy, bounding box regression loss and so on and points out the advantages and improvement space of the YOLO series algorithms. Discussing the relevant technologies of the YOLOv1 to YOLOv7 series algorithms in detail in three parts: basic structure, strengths and weaknesses, and compares the algorithm performance. On this basis, combined with the challenges of object detection technology in UAV applications, various solutions for improving the YOLO series algorithms and applying them to UAV object detection scenarios are demonstrated. The improvement strategies, application scenarios, academic contributions and limitations of the algorithms are summarized. Finally, the future development directions and challenges of applying YOLO series algorithms to UAV object recognition are prospected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35

Similar content being viewed by others

Abbreviations

AP:

Average precision

BN:

Batch normalization

CBAM:

Convolutional block attention module

CBL:

Convolutions + BN + Leaky ReLU activation function

CBM:

Convolution + BN + mish activation function

CIoU:

Complete IoU

CNN:

Convolutional neural networks

CSP:

Cross stage partial

DIoU:

Distance IoU

FLOPS:

Floating point operations per second

FPN:

Feature pyramid network

FPS:

Frame per second

GIoU:

Generalized IoU

IoU:

Intersection over union

mAP:

Mean average precision

MSE:

Mean square error

NMS:

Non-maximum suppression

OTA:

Optimal transport assignment

PANet:

Path aggregation network

R-CNN:

Region with CNN feature

ResNet:

Residual network

RPN:

Region proposal network

SAM:

Spatial attention module

SIoU:

Soft IoU

SPP:

Spatial pyramid pooling

SSD:

Single shot MultiBox detector

TAL:

Task alignment learning

TAP:

Task-aligned predictors

TOOD:

Task-aligned one-stage object detection

VGGNet:

Visual geometry group network

YOLO:

You only look once

References

  1. Hwang J, Kim H (2019) Consequences of a green image of drone food delivery services: the moderating role of gender and age. Bus Strat Environ 28:872–884

    Article  Google Scholar 

  2. Hwang J, Kim JJ, Lee KW (2021) Investigating consumer innovativeness in the context of drone food delivery services: its impact on attitude and behavioral intentions. Technological Forecasting and Social Change

  3. KyrkouC, Plastiras G, Theocharides T, Venieris SI, Bouganis CS (2018) DroNet: efficient convolutional neural network detector for real-time UAV applications. In: 2018 Design, automation & test in europe conference & exhibition. 967–972

  4. Nuijten RJ, Kooistra L, De Deyn GB (2019) Using unmanned aerial systems (UAS) and object-based image analysis (OBIA) for measuring plant-soil feedback effects on crop productivity. Drones 3:54

    Article  Google Scholar 

  5. Kyrkou C, Theocharides T (2020) Emergencynet: efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion. IEEE J Sel Top App Earth Observ Remote Sens 13:1687–1699

    Article  ADS  Google Scholar 

  6. Kim H, Kim D, Jung S, Koo J, Shin JU, Myung H (2015) Development of a UAV-type jellyfish monitoring system using deep learning. In: 2015 12th International conference on ubiquitous robots and ambient intelligence (URAI). 495–497

  7. Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Society

  8. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 28:1137–1149

    Article  Google Scholar 

  9. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  10. He KM, Zhang X, Ren S (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778

  11. Lin TY, Dollar P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition. 1686–1696

  12. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768

  13. Liu W et al. (2016) Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016, pp. 21–37: Springer

  14. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2961–2969

  15. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. 2980–2988

  16. Redmon J, Farhadi JA (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271

  17. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV). 734–750

  18. Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. in arXiv preprint

  19. Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 3588–3597

  20. Gevorgyan Z (2022) "SIoU loss: more powerful learning for bounding box regression. In: arXiv preprint

  21. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788

  22. Szegedy C et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9

  23. Redmon J, Farhadi A. YOLOv3: an incremental improvement. arXiv preprint. Available: arXiv:1804.02767

  24. Bochkovskiy A, Wang CY, Liao HYM. Yolov4: optimal speed and accuracy of object detection. arXiv preprint, Available: arXiv:2004.10934

  25. Ghiasi G, Lin TY, Le QV (2018) Dropblock: A regularization method for convolutional networks. Advances in neural information processing systems, p. 31

  26. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916

    Article  PubMed  Google Scholar 

  27. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). 3–19

  28. Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia. 516–520

  29. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 658–666

  30. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence. 12993–13000

  31. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR'06). 850–855

  32. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) "Yolox: Exceeding yolo series. arXiv preprint, Available: arXiv:2107.08430

  33. Ge Z, Liu S, Li Z, Yoshie O, Sun J (2021) Ota: Optimal transport assignment for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 303–312

  34. Li C et al. (2022) YOLOv6: a single-stage object detection framework for industrial applications. Available: arXiv:2209.02976

  35. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13733–13742

  36. Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) Tood: task-aligned one-stage object detection. IEEE/CVF Int Conf Comput Vis 2021:3490–3499

    Google Scholar 

  37. Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7464–7475

  38. Lee Y, Hwang JW, Lee S, Bae Y, Park J (2019) An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 0–0

  39. Wang CY, Bochkovskiy A, Liao HYM (2021) Scaled-yolov4: Scaling cross stage partial network. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. 13029–13038

  40. Wang CY, Liao HYM, Yeh IH (2022) Designing network design strategies through gradient path analysis. In arXiv preprint

  41. Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics. 562–570

  42. Zhao J, Fu X, Yang Z, Xu F (2019) UAV detection and identification in the Internet of Things. In: IWCMC. 1499–1503

  43. Deng Z, Sun H, Zhou S, Zhao J, Lei L, Zou H (2018) Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J Photogramm Remote Sens 145:3–22

    Article  ADS  Google Scholar 

  44. Moranduzzo T, Melgani F, Bazi Y, Alajlan N (2015) A fast object detector based on high-order gradients and Gaussian process regression for UAV images. Int J Remote Sens 36:2713–2733

    Article  Google Scholar 

  45. Dong Q, Zou Q (2017) Visual UAV detection method with online feature classification. In: Electronic and automation control conference (ITNEC). 429–432

  46. Jang B, Y Seo, On B, Im S (2018) Euclidean distance based algorithm for UAV acoustic detection. In: 2018 International conference on electronics, information, and communication (ICEIC), 2018, pp. 1–2

  47. Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. In: 9th international conference on advances in computing and information technology

  48. Yang C, Huang Z, Wang N (2022) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13668–13677.

  49. Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2021) Effective fusion factor in FPN for tiny object detection. In: In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1160–1168

  50. Guo C, Fan B, Zhang Q, Xiang S, C Pan S (2020) Augfpn: improving multi-scale feature learning for object detection. In: In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12595–12604

  51. Bosquet B, Cores D, Seidenari L, Brea VM, Mucientes M, Del Bimbo A (2023) A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recogn 133:108998

    Article  Google Scholar 

  52. Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: In Proceedings of the IEEE conference on computer vision and pattern recognition. 3578–3587

  53. Singh B, Najibi M, Davis LS (2018) Sniper: efficient multi-scale training. In: Advances in neural information processing systems

  54. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: In Proceedings of the IEEE international conference on computer vision. 2980–2988

  55. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS: improving object detection with one line of code. In: IEEE International conference on computer vision (ICCV). 5562–5570

  56. Jawaharlalnehru A et al (2022) Target object detection from unmanned aerial vehicle (UAV) images based on improved YOLO algorithm. Electronics 11:2343

    Article  Google Scholar 

  57. Javed MG et al (2021) "QuantYOLO: a high-throughput and power-efficient object detection network for resource and power constrained UAVs. Digit Image Comput Tech Appl 2021:1–8

    Google Scholar 

  58. Zhang P, Zhong Y, Li X (2019) SlimYOLOv3: Narrower, faster and better for real-time UAV applications. In: In Proceedings of the IEEE/CVF international conference on computer vision workshops. 0–0

  59. Shao Y, Zhang X, Chu H, Zhang X, Zhang D, Rao Y (2022) AIR-YOLOv3: Aerial infrared pedestrian detection via an improved YOLOv3 with network pruning. Appl Sci 12:3627

    Article  CAS  Google Scholar 

  60. Zhu Y, Zhou J, Yang Y, Liu L, Liu F, Kong W (2022) Rapid target detection of fruit trees using UAV imaging and improved light YOLOv4 algorithm. Remote Sens 14:4324

    Article  ADS  Google Scholar 

  61. Wu J, Sun Y, Wang X (2022) Corrosion detection method of transmission line components in mining area based on multiscale enhanced fusion. Mobile Inf Syst. 7408265

  62. Liu W, Quijano K, Crawford MM (2022) YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning. IEEE J Sel Top Appl Earth Observ Remote Sens 15:8085–8094

    Article  ADS  Google Scholar 

  63. Qiu M, Huang L, Tang BH (2022) ASFF-YOLOv5: multielement detection method for road traffic in UAV images based on multiscale feature fusion. Remote Sens 14:3498

    Article  ADS  Google Scholar 

  64. Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: In Proceedings of the IEEE/CVF international conference on computer vision. 2778–2788

  65. Li Z, Namiki A, Suzuki S, Wang Q, Zhang T, Wang W (2022) Application of low-altitude UAV remote sensing image object detection based on improved YOLOv5. Appl Sci 12:8314

    Article  CAS  Google Scholar 

  66. Zhang R, Wen C (2022) SOD-YOLO: a small target defect detection algorithm for wind turbine blades based on improved YOLOv5. Adv Theory Simul 5:2100631

    Article  Google Scholar 

  67. Li Y, Yuan H, Wang Y, Xiao C (2022) GGT-YOLO: a novel object detection algorithm for drone-based maritime cruising. Drones 6:335

    Article  Google Scholar 

  68. Lan Y, Lin S, Guo HY, Deng X (2022) Real-time UAV patrol technology in orchard based on the swin-T YOLOX lightweight model. Remote Sens 14:5806

    Article  ADS  Google Scholar 

  69. Wang X, He N, Hong C, Wang Q, Chen M (2023) Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis Comput 135:104697

    Article  Google Scholar 

  70. Ru C, Zhang S, Qu C, Zhang Z (2022) The high-precision detection method for insulators’ self-explosion defect based on the unmanned aerial vehicle with improved lightweight ECA-YOLOX-Tiny model. Appl Sci 12:9314

    Article  CAS  Google Scholar 

  71. Zeng Y, Zhang T, He W, Zhang Z (2023) Yolov7-uav: an unmanned aerial vehicle image object detection algorithm based on improved yolov7. Electronics 12:3141

    Article  Google Scholar 

  72. Zhao H, Zhang H, Zhao Y (2023) Yolov7-sea: object detection of maritime uav images based on improved yolov7. In: In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 233–238

  73. Zhao L, Zhu M (2023) MS-YOLOv7: YOLOv7 based on multi-scale for object detection on UAV aerial photography. Drones 7:188

    Article  Google Scholar 

  74. Cao Y et al. (2021) VisDrone-DET2021: the vision meets drone object detection challenge results. In: In Proceedings of the IEEE/CVF International conference on computer vision. 2847–2854

Download references

Acknowledgements

The authors extend their sincere appreciation to the Faculty of Information Sciences and Engineering and the Software Engineering and Digital Innovation Center at Management and Science University, Malaysia, for their invaluable assistance and support during the course of this study. The authors gratefully acknowledge the financial supports by the Sichuan Provincial Intellectual Property Special Fund Project under project number 2022-ZS-00156.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Irsyad Abdullah.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiao, L., Abdullah, M.I. YOLO series algorithms in object detection of unmanned aerial vehicles: a survey. SOCA (2024). https://doi.org/10.1007/s11761-024-00388-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11761-024-00388-w

Keywords

Navigation