Skip to main content
Log in

YOLO-based Object Detection Models: A Review and its Applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In computer vision, object detection is the classical and most challenging problem to get accurate results in detecting objects. With the significant advancement of deep learning techniques over the past decades, most researchers work on enhancing object detection, segmentation and classification. Object detection performance is measured in both detection accuracy and inference time. The detection accuracy in two stage detectors is better than single stage detectors. In 2015, the real-time object detection system YOLO was published, and it rapidly grew its iterations, with the newest release, YOLOv8 in January 2023. The YOLO achieves a high detection accuracy and inference time with single stage detector. Many applications easily adopt YOLO versions due to their high inference speed. This paper presents a complete survey of YOLO versions up to YOLOv8. This article begins with explained about the performance metrics used in object detection, post-processing methods, dataset availability and object detection techniques that are used mostly; then discusses the architectural design of each YOLO version. Finally, the diverse range of YOLO versions was discussed by highlighting their contributions to various applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Matsuzaka Y, Yashiro R (2023). AI-Based Computer Vision Techniques and Expert Systems. AI, 4(1), 289-302.

  2. Soviany P, Ionescu RT (2018). Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) (pp. 209-214). IEEE

  3. Harzallah H, Jurie F, Schmid C (2009). Combining efficient object localization and image classification. In 2009 IEEE 12th international conference on computer vision (pp. 237-244). IEEE.

  4. Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  PubMed  Google Scholar 

  5. Khurana K, Awasthi R (2013) Techniques for object recognition in images and multi-object detection. Int J Adv Res Comput Eng Technol (IJARCET) 2(4):1383–1388

    Google Scholar 

  6. Yuan L, Lu F (2018). Real-time ear detection based on embedded systems. In: 2018 International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 1, pp. 115-120). IEEE

  7. Nayagam MG, Ramar K (2015) A survey on real time object detection and tracking algorithms. Int J Appl Eng Res 10(9):8290–8297

    Google Scholar 

  8. Varma S, Sreeraj M (2013). Object detection and classification in surveillance system. In 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS) (pp. 299-303). IEEE

  9. Verma NK, Sharma T, Rajurkar SD, Salour A (2016). Object identification for inventory management using convolutional neural network. In 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR) (pp. 1-6). IEEE

  10. Rana M, Bhushan M (2023) Machine learning and deep learning approach for medical image analysis: diagnosis to detection. Multimed Tools Appli 82(17):26731–26769

    Article  Google Scholar 

  11. Raab D, Fezer E, Breitenbach J, Baumgartl H, Sauter D, Buettner R (2022). A Deep Learning-Based Model for Automated Quality Control in the Pharmaceutical Industry. In: 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 266-271). IEEE

  12. Viola P, Jones M (2001) Robust real-time object detection. Int J Comput Vision 4(34–47):4

    Google Scholar 

  13. Lingani GM, Rawat DB Garuba M (2019). Smart traffic management system using deep learning for smart city applications. In:2019 IEEE 9th annual computing and communication workshop and conference (CCWC) (pp. 0101-0106). IEEE.

  14. Durai SKS, Shamili MD (2022) Smart farming using machine learning and deep learning techniques. Decision Analy J 3:100041

    Article  Google Scholar 

  15. Nguyen HAT, Sophea T, Gheewala SH, Rattanakom R, Areerob T, Prueksakorn K (2021) Integrating remote sensing and machine learning into environmental monitoring and assessment of land use change. Sustain Prod Consumpt 27:1239–1254

    Article  Google Scholar 

  16. F1 score- https://encord.com/blog/f1-score-in-machine-learning/#:~:text=This%20is%20because%20the%20regular,the%20majority%20class's%20strong%20influence. Accessed 20 Jan 2024

  17. IoU- https://towardsdatascience.com/map-mean-average-precision-might-confuse-you-5956f1bfa9e2. Accessed 12 Sept 2023

  18. Jiang Y, Qiu H, McCartney M, Sukhatme G, Gruteser M, Bai F, ..., Govindan R (2015). Carloc: Precise positioning of automobiles. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (pp. 253-265)

  19. Padilla R, Netto SL, Da Silva EA (2020). A survey on performance metrics for object-detection algorithms. In 2020 international conference on systems, signals and image processing (IWSSIP) (pp. 237-242). IEEE.

  20. Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4507-4515

  21. Wei X, Zhang H, Liu S, Lu Y (2020) Pedestrian detection in underground mines via parallel feature transfer network. Pattern Recog 103:107195

    Article  Google Scholar 

  22. Vennelakanti A, Shreya S, Rajendran R, Sarkar, Muddegowda D, Hanagal P (2019) Traffic sign detection and recognition using a CNN ensemble. In 2019 IEEE international conference on consumer electronics (ICCE) (pp. 1-4). IEEE

  23. Umer S, Rout RK, Pero C, Nappi M (2022). Facial expression recognition with trade-offs between data augmentation and deep learning features. J Ambient Intel Humanized Comput. 1-15

  24. Shao S, Li Z, Zhang T, Peng C, Yu G, Zhang X, ..., & Sun J (2019). Objects365: A large-scale, high-quality dataset for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 8430-8439).

  25. Fregin A, Muller J, Krebel U, Dietmayer K (2018) The driveu traffic light dataset: Introduction and comparison with existing datasets. In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 3376-3383). IEEE.

  26. Deng J, Dong W, Socher R, Li L. J., Li K, Fei-Fei L (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE.

  27. Tousch AM, Herbin S, Audibert JY (2012) Semantic hierarchies for image annotation: A survey. Patt Recog 45(1):333–345

    Article  ADS  Google Scholar 

  28. Manikandan NS, Ganesan K (2019). Deep learning based automatic video annotation tool for self-driving car. arXiv preprint arXiv:1904.12618

  29. Labelimg (2022), https://github.com/HumanSignal/labelImg. Accessed 28 Sept 2023

  30. Makesense (2021), https://github.com/peng-zhihui/Make-Sense. Accessed 29 Sept 2023

  31. Roboflow (2020), https://roboflow.com/. Accessed 29 Sept 2023

  32. LabelBox (2018), https://labelbox.com/product/annotate/. Accessed 5 Oct 2023

  33. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vision 77:157–173

    Article  Google Scholar 

  34. CVAT (2023) https://github.com/opencv/cvat. Accessed 5 Oct 2023

  35. VoTT (visual object tagging tool) (2019), https://github.com/microsoft/VoTT/blob/master/README.md. Accessed 11 Oct 2023

  36. CIFAR-10 Dataset. https://www.cs.toronto.edu/~kriz/cifar.html. Accessed 25 Oct 2023

  37. Doon R, Rawat TK, Gautam S (2018) Cifar-10 classification using deep convolutional neural network. In 2018 IEEE Punecon (pp. 1-5). IEEE

  38. Imagenet Dataset, https://www.image-net.org/download.php. Accessed 28 Oct 2023

  39. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, ..., & Zitnick CL (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (pp. 740-755). Springer International Publishing.

  40. Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140

  41. Kuznetsova A Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, ... , Ferrari V (2020). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision 128(7), 1956-1981.

  42. Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogram Remote Sens 117:11–28

    Article  ADS  Google Scholar 

  43. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogram Remote Sens 159:296–307

    Article  ADS  Google Scholar 

  44. Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: A small target detection benchmark. J Vis Commun Image Represent 34:187–203

    Article  Google Scholar 

  45. Ch'ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 935-942). IEEE.

  46. Grosicki E, El-Abed H (2011) Icdar 2011-french handwriting recognition competition. In 2011 International Conference on Document Analysis and Recognition (pp. 1459-1463). IEEE.

  47. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014). Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227

  48. Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3213-3221

  49. Neumann L, Karg M, Zhang S, Scharfenberger C, Piegert E, Mistr S, ... ,Schiele B (2019). Nightowls: A pedestrians at night dataset. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part I 14 (pp. 691-705). Springer International Publishing.

  50. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Patt Analy Machine Intel 34(4):743–761

    Article  Google Scholar 

  51. Søgaard A, Plank B, Hovy D (2014) Selection bias, label bias, and bias in ground truth. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts. pp. 11-13

  52. Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomput 396:39–64

    Article  Google Scholar 

  53. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57:137–154

    Article  Google Scholar 

  54. Viola P, Jones M (2001). Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I).Ieee.

  55. Zhang H, Hong X (2019) Recent progresses on object detection: a brief review. Multimed Tools Appli 78:27809–27847

    Article  Google Scholar 

  56. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.[85]

  57. Fu J, Zhao C, Xia Y, Liu W (2020) Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset. Multimed Tools Appli 79:12615–12634

    Article  Google Scholar 

  58. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. 2980-2988

  59. Nguyen ND, Do T, Ngo TD, Le DD (2020) An evaluation of deep learning methods for small object detection. J Electric Comput Eng 2020:1–18

    Article  Google Scholar 

  60. Zhou J, Tian Y, Li W, Wang R, Luan Z, Qian D (2019) LADet: A light-weight and adaptive network for multi-scale object detection. In Asian Conference on Machine Learning. 912-923. PMLR

  61. Aziz L, Salam MSBH, Sheikh UU, Ayub S (2020) Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection: A comprehensive review. IEEE Access 8:170461–170495

    Article  Google Scholar 

  62. Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Patt Analy Machine Intel 38(1):142–158

    Article  Google Scholar 

  63. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Analy Machine Intel 37(9):1904–1916

    Article  Google Scholar 

  64. Girshick R (2015). Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. 1440-1448

  65. Ren S, He K Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.

  66. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2961-2969

  67. Kachouane M, Sahki S, Lakrouf M, Ouadah N (2012) HOG based fast human detection. In: 2012 24th International Conference on Microelectronics (ICM) (pp. 1-4). IEEE.

  68. Cucliciu T, Lin CY, Muchtar K (2017). A DPM based object detector using HOG-LBP features. In: 2017 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW) (pp. 315-316). IEEE

  69. Salari A, Djavadifar A, Liu X, Najjaran H (2022) Object recognition datasets and challenges: A review. Neurocomputing 495:129–152

    Article  Google Scholar 

  70. Object detection- https://www.frontiersin.org/articles/10.3389/frobt.2015.00029/full. Accessed 11 Nov 2023

  71. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779-788

  72. Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111:98–136

    Article  Google Scholar 

  73. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263-7271

  74. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  75. Furusho Y, Ikeda K (2020) Theoretical analysis of skip connections and batch normalization from generalization and optimization perspectives. APSIPA Transactions on Signal and Information Processing 9

  76. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  77. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence 34(07): 12993-13000

  78. IoU loss function: https://learnopencv.com/iou-loss-functions-object-detection/#ciou-complete-iou-loss. Accessed 14 Nov 2023

  79. Jocher G (2020) YOLOv5 by Ultralytics. https://github.com/ultralytics/yolov5. Accessed 12 Jan 2024

  80. Ghiasi G, Cui Y, Srinivas A, Qian R, Lin TY, Cubuk ED, ..., Zoph B (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2918-2928

  81. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412

  82. Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Wei X (2022). YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976

  83. Zhang H, Wang Y, Dayoub F, Sunderhauf N (2021) Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8514-8523

  84. Li X, Wang W, Wu L, Chen S, Hu X, Li J, ..., Yang J (2020) Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Advances in Neural Information Processing Systems, 33, 21002-21012.

  85. Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) Tood: Task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 3490-3499). IEEE Computer Society

  86. Shu C, Liu Y, Gao J, Yan Z, Shen C (2021) Channel-wise knowledge distillation for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5311-5320

  87. Ding X, Chen H, Zhang X, Huang, K, Han J, Ding G (2022) Re-parameterizing your optimizers rather than architectures. arXiv preprint arXiv:2205.15242

  88. Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7464-7475

  89. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021). Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13733-13742

  90. Yolov8- https://sandar-ali.medium.com/ultralytics-unveiled-yolov8-on-january-10-2023-which-has-garnered-over-one-million-downloads-338d8f11ec5. Accessed 20 Jan 2024

  91. Nanni L, Ghidoni S, Brahnam S (2017) Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recog 71:158–172

    Article  ADS  Google Scholar 

  92. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580-587

  93. Jamtsho Y, Riyamongkol P, Waranusast R (2021) Real-time license plate detection for non-helmeted motorcyclist using YOLO. Ict Express 7(1):104–109

    Article  Google Scholar 

  94. Han X, Chang J, Wang K (2021) Real-time object detection based on YOLO-v2 for tiny vehicle object. Procedia Comput Sci 183:61–72

    Article  Google Scholar 

  95. Sahin O, Ozer S (2021) Yolodrone: Improved yolo architecture for object detection in drone images. In: 2021 44th International Conference on Telecommunications and Signal Processing (TSP) (pp. 361-365). IEEE

  96. Ma D, Fang H, Wang N, Zhang C, Dong J, Hu H (2022) Automatic detection and counting system for pavement cracks based on PCGAN and YOLO-MF. IEEE Trans Intel Transport Syst 23(11):22166–22178

    Article  Google Scholar 

  97. Wu D, Lv S, Jiang M, Song H (2020) Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput Electron Agriculture 178:105742

    Article  Google Scholar 

  98. Dewi C, Chen RC, Jiang X, Yu H (2022) Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed Tools Appli 81(26):37821–37845

    Article  Google Scholar 

  99. Bhambani, K., Jain, T., & Sultanpure, K. A. (2020, October). Real-time face mask and social distancing violation detection system using yolo. In 2020 IEEE Bangalore Humanitarian Technology Conference (B-HTC) (pp. 1-6). IEEE.

  100. Ficzere M, Mészáros LA, Kállai-Szabó N, Kovács A, Antal I, Nagy ZK, Galata DL (2022) Real-time coating thickness measurement and defect recognition of film coated tablets with machine vision and deep learning. Int J Pharm 623:121957

    Article  CAS  PubMed  Google Scholar 

  101. Kang L, Lu Z, Meng L, Gao Z (2024) YOLO-FA: Type-1 fuzzy attention based YOLO detector for vehicle detection. Expert Syst Appli 237:121209

    Article  Google Scholar 

  102. Wang Y, Wang H, Xin Z (2022) Efficient detection model of steel strip surface defects based on YOLO-V7. IEEE Access 10:133936–133944

    Article  Google Scholar 

  103. Wang CY, Liao HYM, Yeh IH (2022) Designing network design strategies through gradient path analysis. arXiv preprint arXiv:2211.04800

  104. Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics. https://github.com/ultralytics/ultralytics. Accessed 21 Jan 2024

  105. Cui Y, Yan L, Cao Z, Liu D. (2021). Tf-blender: Temporal feature blender for video object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8138-8147)

  106. Yan L, Ma S, Wang Q, Chen Y, Zhang X, Savakis A, Liu D (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst for Video Technol 32(10):6642–6656

    Article  Google Scholar 

  107. Yan L, Wang Q, Ma S, Wang J, Yu C (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Ministry of Heavy Industries (MHI), Government of India, for their funding support under the Scheme for Enhancement of Competitiveness in the Indian Capital Goods Sector Phase II. The authors express their gratitude to SASTRA Deemed University, Thanjavur, for providing the infrastructural facilities to carry out this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subramaniyaswamy Vairavasundaram.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vijayakumar, A., Vairavasundaram, S. YOLO-based Object Detection Models: A Review and its Applications. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18872-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18872-y

Keywords

Navigation