Skip to main content
Log in

YOLOF-F: you only look one-level feature fusion for traffic sign detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper proposes a detector that focuses on multi-scale detection problems and effectively enhances the detection performance to solve the problem that is hard to detect minor traffic signs. This detector, called YOLOF-F (you only look one-level feature fusion), is a single-stage detector that extracts multi-scale feature information from a single layer of fusion feature. First, we propose FFM (feature fusion module) to fuse different scales. Next, we offer a new encoder CDE (corner dilated encoder) to enhance the angular point information in the feature map, improve position regression accuracy, and maintain a faster detection speed. Finally, YOLOF-F achieved 74.57% and 77.23% of the AP on the GTSDB and CTSD datasets and reached 32 FPS. Extensive experiments validate that YOLOF-F is faster and more effective than most traffic sign detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Rajendran, S.P., Shine, L., Pradeep, R., et al.: Fast and accurate traffic sign recognition for self driving cars using retinanet based detector. In: International Conference on Communication and Electronics Systems (ICCES), pp. 784–790 (2019)

  2. Liu, X., Xiong, F.: A real-time traffic sign detection model based on improved yolov3. In: IOP Conference Series: Materials Science and Engineering, vol. 787, p. 012034 (2020)

  3. Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13039–13048 (2021)

  4. Lin, T.-Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)

  5. Cao, J., Zhang, J., Huang, W.: Traffic sign detection and recognition using multi-scale fusion and prime sample attention. IEEE Access 9, 3579–3591 (2020)

    Article  Google Scholar 

  6. Ren, K., Huang, L., Fan, C., et al.: Real-time traffic sign detection network using ds-detnet and lite fusion fpn. J. Real-Time Image Proc. 18(6), 2181–2191 (2021)

    Article  Google Scholar 

  7. Liu, F., Qian, Y., Li, H., et al.: Caffnet: channel attention and feature fusion network for multi-target traffic sign detection. Int. J. Pattern Recognit. Artif. Intell. 35(07), 2152008 (2021)

    Article  Google Scholar 

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)

    Article  Google Scholar 

  9. Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38 (1998)

    Article  Google Scholar 

  10. Vaillant, R., Monrocq, C., Le Cun, Y.: Original approach for the localisation of objects in images. IEE Proc. Vis. Image Signal Process 141(4), 245–250 (1994)

    Article  Google Scholar 

  11. Sermanet, P., Kavukcuoglu, K., Chintala, S., et al.: Pedestrian detection with unsupervised multi-stage feature learning. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633 (2013)

  12. Lin, T.-Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)

  13. Liu, W., Anguelov, D., Erhan, D., et al.: Ssd: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)

  14. Li, Z., Peng, C., Yu, G., et al.: Detnet: design backbone for object detection. In: European Conference on Computer Vision (ECCV), pp. 334–350 (2018)

  15. Zhang, S., Chi, C., Yao, Y., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Conference on Computer Vision and Pattern Recognition(CVPR), pp. 9759–9768 (2020)

  16. Zhao, Q., Sheng, T., Wang, Y., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI Conference On Artificial Intelligence(AAAI), vol. 33, pp. 9259–9266 (2019)

  17. Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7036–7045 (2019)

  18. Kong, T., Sun, F., Tan, C., et al.: Deep feature pyramid reconfiguration for object detection. In: European Conference on Computer Vision (ECCV), pp. 169–185 (2018)

  19. Liu, S., Qi, L., Qin, H., et al.: Path aggregation network for instance segmentation. In: Conference on Computer Vision and Pattern Recognition(CVPR), pp. 8759–8768 (2018)

  20. Wei, H., Zhang, Q., Han, J., et al.: Sarnet: Spatial attention residual network for pedestrian and vehicle detection in large scenes. Appl. Intell. 1–16 (2022)

  21. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790 (2020)

  22. Girshick, R.: Fast r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

  23. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

  24. Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  25. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  26. Zhang, S., Chi, C., Yao, Y., Lei, Z., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Conference on Computer Vision and Pattern Recognition(CVPR), pp. 9759–9768 (2020)

  27. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: Conference on Computer Vision and Pattern Recognition(CVPR), pp. 779–788 (2016)

  28. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)

  29. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  30. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

  31. Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: International Conference on Computer Vision (ICCV), pp. 6569–6578 (2019)

  32. Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: European Conference on Computer Vision (ECCV), pp. 213–229 (2020)

  33. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)

    Google Scholar 

  34. Jin, Y., Fu, Y., Wang, W., et al.: Multi-feature fusion and enhancement single shot detector for traffic sign recognition. IEEE Access 8, 38931–38940 (2020)

    Article  Google Scholar 

  35. Wang, F., Li, Y., Wei, Y., et al.: Improved faster rcnn for traffic sign detection. In: Conference on Intelligent Transportation Systems (ITSC), pp. 1–6 (2020)

  36. Bi, Z., Yu, L., Gao, H., et al.: Improved vgg model-based efficient traffic sign recognition for safe driving in 5g scenarios. Int. J. Mach. Learn. Cybern. 12(11), 3069–3080 (2021)

    Article  Google Scholar 

  37. Wan, J., Ding, W., Zhu, H., et al.: An efficient small traffic sign detection method based on yolov3. J. Signal Process. Syst. 93(8), 899–911 (2021)

    Article  Google Scholar 

  38. Zhu, X., Hu, H., Lin, S., et al.: Deformable convnets v2: More deformable, better results. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9308–9316 (2019)

  39. Houben, S., Stallkamp, J., Salmen, J., et al.: Detection of traffic signs in real-world images: the german traffic sign detection benchmark. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)

  40. Zhang, J., Huang, M., Jin, X., et al.: A real-time chinese traffic sign detection algorithm based on modified yolov2. Algorithms 10(4), 127 (2017)

    Article  MathSciNet  Google Scholar 

  41. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  42. Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgements

This work was supported by the National Science Foundation of China under Grant U1803261 and funded by the National Natural Science Foundation of China (61966035), Xinjiang Uygur Autonomous Region Innovation Team (XJEDU2017T002), the international cooperation project of China-region Science and Technology Department “Data-driven China-Russia cloud computing sharing platform construction” (2020E01023), research on depth learning labeling method based on multi-feature fusion (2020D01A34), and research on Video Information Processing Technology Based on Public Safety(U1803261).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yurong Qian.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, H., Zhang, Q., Qin, Y. et al. YOLOF-F: you only look one-level feature fusion for traffic sign detection. Vis Comput 40, 747–760 (2024). https://doi.org/10.1007/s00371-023-02813-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02813-1

Keywords

Navigation