Improved YOLOv5 network for real-time multi-scale traffic sign detection

Wang, Junfan; Chen, Yi; Dong, Zhekang; Gao, Mingyu

doi:10.1007/s00521-022-08077-5

Improved YOLOv5 network for real-time multi-scale traffic sign detection

Original Article
Published: 09 December 2022

Volume 35, pages 7853–7865, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Junfan Wang^1,2,
Yi Chen^1,2,
Zhekang Dong^1,2,3 &
…
Mingyu Gao ORCID: orcid.org/0000-0003-4678-6937^1,2

4351 Accesses
109 Citations
1 Altmetric
Explore all metrics

Abstract

Traffic sign detection is a challenging task for the unmanned driving system, especially for the detection of multi-scale targets and the real-time problem of detection. In the traffic sign detection process, the scale of the targets changes greatly, which will have a certain impact on the detection accuracy. Feature pyramid is widely used to solve this problem, but due to the diversity of traffic sign sizes, it cannot accurately extract multi-size feature maps, thus destroying the feature consistency between traffic signs. Moreover, in practical application, it is difficult for common methods to improve the detection accuracy of multi-scale traffic signs while ensuring real-time detection. In this paper, we propose an improved feature pyramid model, named AF-FPN, which utilizes the adaptive attention module (AAM) and feature enhancement module (FEM) to reduce the information loss in the process of feature map generation and enhance the representation ability of the feature pyramid. We replaced the original feature pyramid network in YOLOv5 with AF-FPN, which improves the detection performance for multi-scale targets of the YOLOv5 network under the premise of ensuring real-time detection. Furthermore, a new automatic learning data augmentation method is proposed to enrich the dataset and improve the robustness of the model to make it more suitable for practical scenarios. Extensive experimental results on the Tsinghua-Tencent 100 K (TT100K) dataset demonstrate that compared with several state-of-the-art methods, our method is more universal and superior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time traffic sign detection based on multiscale attention and spatial information aggregator

Article 16 September 2022

Traffic sign detection based on improved faster R-CNN for autonomous driving

Article 06 January 2022

Traffic sign detection based on multi-scale feature extraction and cascade feature fusion

Article 06 August 2022

Availability of data and material

Data are available upon request.

Code availability

Code is available upon request.

References

Timofte R, Zimmermann K, Van Gool L (2009) Multi-view traffic sign detection, recognition, and 3D localisation. 2009 Workshop Appl Comput Vision (WACV). https://doi.org/10.1109/WACV.2009.5403121
Article Google Scholar
Shaoqing Ren KH, Girshick Ross, Sun Jian (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Machine Intell 39:1137–49. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: 30th conference on neural information processing systems (NIPS 2016), Barcelona, Spain
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. (2016) SSD: single shot multibox detector. In: Computer vision–ECCV 2016 ECCV 2016 lecture notes in computer science. vol 9905 pp 21–37 https://doi.org/10.1007/978-3-319-46448-0_2
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6517–25.https://doi.org/10.1109/Cvpr.2017.690
Pramanik A, Sarkar S, Maiti J (2021) A real-time video surveillance system for traffic pre-events detection. Accident Anal Prev. https://doi.org/10.1016/j.aap.2021.106019
Article Google Scholar
Shen L, You L, Peng B, Zhang C (2021) Group multi-scale attention pyramid network for traffic sign detection. Neurocomputing 452:1–14. https://doi.org/10.1016/j.neucom.2021.04.083
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR
Ultralytics (2020) YOLOv5 2020 Available from: https://github.com/ultralytics/yolov5
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2019) AutoAugment: learning augmentation strategies from data. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR 2019). pp 113–23. https://doi.org/10.1109/Cvpr.2019.00020.
Ning X, Gong K, Li W, Zhang L, Bai X, Tian S (2020) Feature refinement and filter network for person re-identification. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/tcsvt.2020.3043026
Article Google Scholar
Ning X, Duan PF, Li WJ, Zhang SL (2020) Real-time 3D face alignment using an encoder-decoder network with an efficient deconvolution layer. IEEE Signal Proc Let 27:1944–1948. https://doi.org/10.1109/Lsp.2020.3032277
Article Google Scholar
Bochkovskiy A, Wang C-Y, Mark Liao H-Y (2020) Yolov4: optimal speed and accuracy of object detection. Computer vision and pattern recognition
Ouyang WL, Wang XG, Zeng XY, Qiu S, Luo P, Tian YL et al (2015) DeepID-Net: deformable deep convolutional neural networks for object detection. IEEE Conf Comput Vision Pattern Recognition (CVPR) 2015:2403–2412. https://doi.org/10.1109/CVPR.2015.7298854
Article Google Scholar
Shao FM, Wang XQ, Meng FJ, Rui T, Wang D, Tang J (2018) Real-time traffic sign detection and recognition method based on simplified gabor wavelets and CNNs. Sens Basel. https://doi.org/10.3390/s18103192
Article Google Scholar
Shao FM, Wang XQ, Meng FJ, Zhu JW, Wang D, Dai JY (2019) Improved faster R-CNN traffic sign detection based on a second region of interest and highly possible regions proposal network. Sens Basel. https://doi.org/10.3390/s19102288
Article Google Scholar
Zhang J, Huang M, Jin X, Li X (2017) A real-time chinese traffic sign detection algorithm based on modified YOLOv2. Algorithms. https://doi.org/10.3390/a10040127
Article MATH Google Scholar
Li JA, Liang XD, Wei Y, Xu TF, Feng JS, Yan SC (2017) Perceptual generative adversarial networks for small object detection. Proc Cvpr IEEE. https://doi.org/10.1109/Cvpr.2017.211
Article Google Scholar
Liu ZW, Shen C, Qi MY, Fan X (2020) SADANet: integrating scale-aware and domain adaptive for traffic sign detection. IEEE Access 8:77920–77933. https://doi.org/10.1109/Access.2020.2989758
Article Google Scholar
Singh B, Davis LS (2018) An analysis of scale invariance in object detection-SNIP. arXiv:171108189 [csCV]
Yukang Chen YL, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia (2021) Scale-aware automatic augmentation for object detection. arXiv:210317220
Luo J-Q, Fang H-S, Shao F-M, Zhong Y, Hua X (2020) Multi-scale traffic vehicle detection based on faster R-CNN with NAS optimization and feature enrichment. Def Technol. https://doi.org/10.1016/j.dt.2020.10.006
Article Google Scholar
Lin TY, Dollar P, Girshick R, He KM, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proc CVPR IEEE. https://doi.org/10.1109/Cvpr.2017.106
Article Google Scholar
He KM, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. IEEE I Conf Comp Vis. https://doi.org/10.1109/Iccv.2017.322
Article Google Scholar
Lin TY, Goyal P, Girshick R, He KM, Dollar P (2017) Focal loss for dense object detection. IEEE I Conf Comp Vis. https://doi.org/10.1109/Iccv.2017.324
Article Google Scholar
Cao L, Xiao Y, Xu L (2021) EMface detecting hard faces by exploring receptive field pyraminds. Comput Vision Pattern Recogn
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. CVPR 2009 IEEE Conf Comput Vision Pattern Recogn 14:248–55. https://doi.org/10.1109/cvpr.2009.5206848
Article Google Scholar
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Taylor L, Nitschke G (2018) Improving deep learning with generic data augmentation. IEEE Sympos Ser Comput Intell (IEEE Ssci) 2018:1542–1547
Google Scholar
Zhang H, Wu QMJ (2011) Pattern recognition by affine legendre moment invariants. IEEE Image Proc 797–800
Lv JJ, Cheng C, Tian GD, Zhou XD, Zhou X (2016) Landmark perturbation-based data augmentation for unconstrained face recognition. Signal Proc Image 47:465–475. https://doi.org/10.1016/j.image.2016.03.011
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: International conference on international conference on machine learning omnipress
Dwibedi D, Misra I, Hebert M (2017) Cut, paste and learn: surprisingly easy synthesis for instance detection. IEEE I Conf Comp Vis. https://doi.org/10.1109/Iccv.2017.146
Article Google Scholar
Fang HS, Sun JH, Wang RZ, Gou MH, Li YL, Lu CW (2019) InstaBoost: boosting instance segmentation via probability map guided Copy-pasting. 2019 IEEE CVF Int Conf Comput Vision (ICCV 2019). https://doi.org/10.1109/Iccv.2019.00077
Article Google Scholar
Singh B, Najibi M, Davis LS (2018) SNIPER: efficient multi-scale training. Adv Neur 31
Tran T, Pham T, Carneiro G, Palmer L, Reid I (2017) A bayesian data augmentation approach for learning deep models. Adv Neural Inform Process Syst 30 (Nips 2017). 30
Shi X, Hu J, Lei X, Xu S (2021) Detection of flying birds in airport monitoring based on improved YOLOv5. In: 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)2021. p 1446–1451 https://doi.org/10.1109/icsp51882.2021.9408797.
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. 2018 IEEE/CVF Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/cvpr.2018.00913
Article Google Scholar
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. 2019 IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR 2019). https://doi.org/10.1109/Cvpr.2019.00075
Article Google Scholar
He YH, Zhu CC, Wang JR, Savvides M, Zhang XY (2019) Bounding box regression with uncertainty for accurate object detection. 2019 IEEE/Cvf Conf Comput Vision Pattern Recogn (CVPR). https://doi.org/10.1109/Cvpr.2019.00300
Article Google Scholar
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU loss: faster and better learning for bounding box regression. AAAI Conf on Aritif Intell. https://doi.org/10.1609/aaai.v34i07.6999
Article Google Scholar
Kim M, Park C, Kim S, Hong T, Ro WW (2019) Efficient dilated-winograd convolutional neural networks. IEEE Int Conf Image Process (ICIP) 2019:2711–2715
Google Scholar
He KM, Zhang XY, Ren SQ, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/Tpami.2015.2389824
Article Google Scholar
Zoph B, Cubuk ED, Ghiasi G, Lin T-Y, Shlens J, Le QV (2019) Learning data augmentation strategies for object detection. arXiv:190611172 [csCV]
Huang S, Wang X, Tao D (2020) SnapMix: semantically proportional mixing for augmenting fine-grained data
Zhou W, Hao X, Cui J, Yu Y, Cao X, Kuijper A (2021) A self-adaptive learning method for motion blur kernel estimation of the single image. Optik. https://doi.org/10.1016/j.ijleo.2021.168023
Article Google Scholar
Wang Z, Li H, Wu ZX, Wu HL (2021) A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int J Adv Robot Syst. https://doi.org/10.1177/1729881421989546
Article Google Scholar
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. IEEE/CVF Conf Comput Vision Pattern Recogn (CVPR) 2018:8697–8710. https://doi.org/10.1109/Cvpr.2018.00907
Article Google Scholar
Zoph B, Shlens J, Le QV (2017) Neural Architecture Search With Reinforcement Learning. arXiv:170707012 [csCV]
Dong Z, Lai CS, Zhang Z, Qi D, Gao M, Duan S (2021) Neuromorphic extreme learning machines with bimodal memristive synapses. Neurocomputing 453:38–49. https://doi.org/10.1016/j.neucom.2021.04.049
Article Google Scholar
Zhu Y, Zhang C, Zhou D, Wang X, Bai X, Liu W (2016) Traffic sign detection and recognition using fully convolutional network guided proposals. Neurocomputing 214:758–766. https://doi.org/10.1016/j.neucom.2016.07.009
Article Google Scholar
Zhu Z, Liang D, Zhang SH, Huang XL, Li BL, Hu SM (2016) Traffic-sign detection and classification in the wild. IEEE Conf Comput Vision Pattern Recogn (CVPR) 2016:2110–2118. https://doi.org/10.1109/Cvpr.2016.232
Article Google Scholar
Zhang J, Xie Z, Sun J, Zou X, Wang J (2020) A Cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 8:29742–29754. https://doi.org/10.1109/access.2020.2972338
Article Google Scholar
YOLOv5-Lite (2021) Available from: https://github.com/ppogg/YOLOv5-Lite
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 10781–10790
Qi D, Tan W, Yao Q, Liu J (2021) YOLO5Face: why reinventing a face detector
Zhang Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2Det a single-shot object detector based on multi-level feature pyramid network. Proc AAAI Conf Artif Intell 33:9259–66
Google Scholar
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:180402767
Vaquero L, Brea VM, Mucientes M (2022) Tracking more than 100 arbitrary objects at 25 FPS through deep learning. Pattern Recogn. https://doi.org/10.1016/j.patcog.2021.108205
Article Google Scholar
Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/Tpami.2011.155
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the editorial board and reviewers for the improvement of this paper.

Funding

This research was funded by the Zhejiang Provincial Key Lab of Equipment Electronics (No. 2019E10009), and the Key Research and Development Program of Zhejiang Province (No. 2020C01110).

Author information

Authors and Affiliations

School of Electronics Information, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
Junfan Wang, Yi Chen, Zhekang Dong & Mingyu Gao
Zhejiang Provincial Key Lab of Equipment Electronics, Hangzhou, 310018, Zhejiang, China
Junfan Wang, Yi Chen, Zhekang Dong & Mingyu Gao
Department of Electronic Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Zhekang Dong

Authors

Junfan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhekang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Gao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization was contributed by Junfan Wang, Mingyu Gao; methodology was contributed by Yi Chen, Zhekang Dong; formal analysis and investigation were contributed by Junfan Wang, Yi Chen; writing—original draft preparation, was contributed by Junfan Wang, Yi Chen; writing—review and editing, was contributed by Junfan Wang; funding acquisition was contributed by Mingyu Gao; resources were contributed by Zhekang Dong; supervision was contributed by Mingyu Gao.

Corresponding author

Correspondence to Mingyu Gao.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Chen, Y., Dong, Z. et al. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput & Applic 35, 7853–7865 (2023). https://doi.org/10.1007/s00521-022-08077-5

Download citation

Received: 29 December 2021
Accepted: 22 November 2022
Published: 09 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08077-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved YOLOv5 network for real-time multi-scale traffic sign detection

Abstract

Access this article

Similar content being viewed by others

Real-time traffic sign detection based on multiscale attention and spatial information aggregator

Traffic sign detection based on improved faster R-CNN for autonomous driving

Traffic sign detection based on multi-scale feature extraction and cascade feature fusion

Availability of data and material

Code availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved YOLOv5 network for real-time multi-scale traffic sign detection

Abstract

Access this article

Similar content being viewed by others

Real-time traffic sign detection based on multiscale attention and spatial information aggregator

Traffic sign detection based on improved faster R-CNN for autonomous driving

Traffic sign detection based on multi-scale feature extraction and cascade feature fusion

Availability of data and material

Code availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation