Abstract
YOLO series algorithms are widely used in unmanned aerial vehicles (UAV) object detection scenarios due to their fast and lightweight properties. This article summarizes the key concepts in YOLO series algorithms, such as the anchor mechanism, feature fusion strategy, bounding box regression loss and so on and points out the advantages and improvement space of the YOLO series algorithms. Discussing the relevant technologies of the YOLOv1 to YOLOv7 series algorithms in detail in three parts: basic structure, strengths and weaknesses, and compares the algorithm performance. On this basis, combined with the challenges of object detection technology in UAV applications, various solutions for improving the YOLO series algorithms and applying them to UAV object detection scenarios are demonstrated. The improvement strategies, application scenarios, academic contributions and limitations of the algorithms are summarized. Finally, the future development directions and challenges of applying YOLO series algorithms to UAV object recognition are prospected.
Similar content being viewed by others
Abbreviations
- AP:
-
Average precision
- BN:
-
Batch normalization
- CBAM:
-
Convolutional block attention module
- CBL:
-
Convolutions + BN + Leaky ReLU activation function
- CBM:
-
Convolution + BN + mish activation function
- CIoU:
-
Complete IoU
- CNN:
-
Convolutional neural networks
- CSP:
-
Cross stage partial
- DIoU:
-
Distance IoU
- FLOPS:
-
Floating point operations per second
- FPN:
-
Feature pyramid network
- FPS:
-
Frame per second
- GIoU:
-
Generalized IoU
- IoU:
-
Intersection over union
- mAP:
-
Mean average precision
- MSE:
-
Mean square error
- NMS:
-
Non-maximum suppression
- OTA:
-
Optimal transport assignment
- PANet:
-
Path aggregation network
- R-CNN:
-
Region with CNN feature
- ResNet:
-
Residual network
- RPN:
-
Region proposal network
- SAM:
-
Spatial attention module
- SIoU:
-
Soft IoU
- SPP:
-
Spatial pyramid pooling
- SSD:
-
Single shot MultiBox detector
- TAL:
-
Task alignment learning
- TAP:
-
Task-aligned predictors
- TOOD:
-
Task-aligned one-stage object detection
- VGGNet:
-
Visual geometry group network
- YOLO:
-
You only look once
References
Hwang J, Kim H (2019) Consequences of a green image of drone food delivery services: the moderating role of gender and age. Bus Strat Environ 28:872–884
Hwang J, Kim JJ, Lee KW (2021) Investigating consumer innovativeness in the context of drone food delivery services: its impact on attitude and behavioral intentions. Technological Forecasting and Social Change
KyrkouC, Plastiras G, Theocharides T, Venieris SI, Bouganis CS (2018) DroNet: efficient convolutional neural network detector for real-time UAV applications. In: 2018 Design, automation & test in europe conference & exhibition. 967–972
Nuijten RJ, Kooistra L, De Deyn GB (2019) Using unmanned aerial systems (UAS) and object-based image analysis (OBIA) for measuring plant-soil feedback effects on crop productivity. Drones 3:54
Kyrkou C, Theocharides T (2020) Emergencynet: efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion. IEEE J Sel Top App Earth Observ Remote Sens 13:1687–1699
Kim H, Kim D, Jung S, Koo J, Shin JU, Myung H (2015) Development of a UAV-type jellyfish monitoring system using deep learning. In: 2015 12th International conference on ubiquitous robots and ambient intelligence (URAI). 495–497
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Society
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 28:1137–1149
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
He KM, Zhang X, Ren S (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778
Lin TY, Dollar P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition. 1686–1696
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768
Liu W et al. (2016) Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016, pp. 21–37: Springer
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2961–2969
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. 2980–2988
Redmon J, Farhadi JA (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV). 734–750
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. in arXiv preprint
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 3588–3597
Gevorgyan Z (2022) "SIoU loss: more powerful learning for bounding box regression. In: arXiv preprint
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788
Szegedy C et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9
Redmon J, Farhadi A. YOLOv3: an incremental improvement. arXiv preprint. Available: arXiv:1804.02767
Bochkovskiy A, Wang CY, Liao HYM. Yolov4: optimal speed and accuracy of object detection. arXiv preprint, Available: arXiv:2004.10934
Ghiasi G, Lin TY, Le QV (2018) Dropblock: A regularization method for convolutional networks. Advances in neural information processing systems, p. 31
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). 3–19
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on Multimedia. 516–520
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 658–666
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence. 12993–13000
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR'06). 850–855
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) "Yolox: Exceeding yolo series. arXiv preprint, Available: arXiv:2107.08430
Ge Z, Liu S, Li Z, Yoshie O, Sun J (2021) Ota: Optimal transport assignment for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 303–312
Li C et al. (2022) YOLOv6: a single-stage object detection framework for industrial applications. Available: arXiv:2209.02976
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13733–13742
Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) Tood: task-aligned one-stage object detection. IEEE/CVF Int Conf Comput Vis 2021:3490–3499
Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7464–7475
Lee Y, Hwang JW, Lee S, Bae Y, Park J (2019) An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 0–0
Wang CY, Bochkovskiy A, Liao HYM (2021) Scaled-yolov4: Scaling cross stage partial network. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. 13029–13038
Wang CY, Liao HYM, Yeh IH (2022) Designing network design strategies through gradient path analysis. In arXiv preprint
Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics. 562–570
Zhao J, Fu X, Yang Z, Xu F (2019) UAV detection and identification in the Internet of Things. In: IWCMC. 1499–1503
Deng Z, Sun H, Zhou S, Zhao J, Lei L, Zou H (2018) Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J Photogramm Remote Sens 145:3–22
Moranduzzo T, Melgani F, Bazi Y, Alajlan N (2015) A fast object detector based on high-order gradients and Gaussian process regression for UAV images. Int J Remote Sens 36:2713–2733
Dong Q, Zou Q (2017) Visual UAV detection method with online feature classification. In: Electronic and automation control conference (ITNEC). 429–432
Jang B, Y Seo, On B, Im S (2018) Euclidean distance based algorithm for UAV acoustic detection. In: 2018 International conference on electronics, information, and communication (ICEIC), 2018, pp. 1–2
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. In: 9th international conference on advances in computing and information technology
Yang C, Huang Z, Wang N (2022) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13668–13677.
Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2021) Effective fusion factor in FPN for tiny object detection. In: In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1160–1168
Guo C, Fan B, Zhang Q, Xiang S, C Pan S (2020) Augfpn: improving multi-scale feature learning for object detection. In: In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12595–12604
Bosquet B, Cores D, Seidenari L, Brea VM, Mucientes M, Del Bimbo A (2023) A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recogn 133:108998
Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: In Proceedings of the IEEE conference on computer vision and pattern recognition. 3578–3587
Singh B, Najibi M, Davis LS (2018) Sniper: efficient multi-scale training. In: Advances in neural information processing systems
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: In Proceedings of the IEEE international conference on computer vision. 2980–2988
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS: improving object detection with one line of code. In: IEEE International conference on computer vision (ICCV). 5562–5570
Jawaharlalnehru A et al (2022) Target object detection from unmanned aerial vehicle (UAV) images based on improved YOLO algorithm. Electronics 11:2343
Javed MG et al (2021) "QuantYOLO: a high-throughput and power-efficient object detection network for resource and power constrained UAVs. Digit Image Comput Tech Appl 2021:1–8
Zhang P, Zhong Y, Li X (2019) SlimYOLOv3: Narrower, faster and better for real-time UAV applications. In: In Proceedings of the IEEE/CVF international conference on computer vision workshops. 0–0
Shao Y, Zhang X, Chu H, Zhang X, Zhang D, Rao Y (2022) AIR-YOLOv3: Aerial infrared pedestrian detection via an improved YOLOv3 with network pruning. Appl Sci 12:3627
Zhu Y, Zhou J, Yang Y, Liu L, Liu F, Kong W (2022) Rapid target detection of fruit trees using UAV imaging and improved light YOLOv4 algorithm. Remote Sens 14:4324
Wu J, Sun Y, Wang X (2022) Corrosion detection method of transmission line components in mining area based on multiscale enhanced fusion. Mobile Inf Syst. 7408265
Liu W, Quijano K, Crawford MM (2022) YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning. IEEE J Sel Top Appl Earth Observ Remote Sens 15:8085–8094
Qiu M, Huang L, Tang BH (2022) ASFF-YOLOv5: multielement detection method for road traffic in UAV images based on multiscale feature fusion. Remote Sens 14:3498
Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: In Proceedings of the IEEE/CVF international conference on computer vision. 2778–2788
Li Z, Namiki A, Suzuki S, Wang Q, Zhang T, Wang W (2022) Application of low-altitude UAV remote sensing image object detection based on improved YOLOv5. Appl Sci 12:8314
Zhang R, Wen C (2022) SOD-YOLO: a small target defect detection algorithm for wind turbine blades based on improved YOLOv5. Adv Theory Simul 5:2100631
Li Y, Yuan H, Wang Y, Xiao C (2022) GGT-YOLO: a novel object detection algorithm for drone-based maritime cruising. Drones 6:335
Lan Y, Lin S, Guo HY, Deng X (2022) Real-time UAV patrol technology in orchard based on the swin-T YOLOX lightweight model. Remote Sens 14:5806
Wang X, He N, Hong C, Wang Q, Chen M (2023) Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis Comput 135:104697
Ru C, Zhang S, Qu C, Zhang Z (2022) The high-precision detection method for insulators’ self-explosion defect based on the unmanned aerial vehicle with improved lightweight ECA-YOLOX-Tiny model. Appl Sci 12:9314
Zeng Y, Zhang T, He W, Zhang Z (2023) Yolov7-uav: an unmanned aerial vehicle image object detection algorithm based on improved yolov7. Electronics 12:3141
Zhao H, Zhang H, Zhao Y (2023) Yolov7-sea: object detection of maritime uav images based on improved yolov7. In: In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 233–238
Zhao L, Zhu M (2023) MS-YOLOv7: YOLOv7 based on multi-scale for object detection on UAV aerial photography. Drones 7:188
Cao Y et al. (2021) VisDrone-DET2021: the vision meets drone object detection challenge results. In: In Proceedings of the IEEE/CVF International conference on computer vision. 2847–2854
Acknowledgements
The authors extend their sincere appreciation to the Faculty of Information Sciences and Engineering and the Software Engineering and Digital Innovation Center at Management and Science University, Malaysia, for their invaluable assistance and support during the course of this study. The authors gratefully acknowledge the financial supports by the Sichuan Provincial Intellectual Property Special Fund Project under project number 2022-ZS-00156.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiao, L., Abdullah, M.I. YOLO series algorithms in object detection of unmanned aerial vehicles: a survey. SOCA (2024). https://doi.org/10.1007/s11761-024-00388-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11761-024-00388-w