Skip to main content
Log in

Real-time detection network for tiny traffic sign using multi-scale attention module

  • Article
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network improved from yolo-v3 for the tiny traffic sign with high precision in real-time. First, a visual multi-scale attention module (MSAM), a light-weight yet effective module, is devised to fuse the multi-scale feature maps with channel weights and spatial masks. It increases the representation power of the network by emphasizing useful features and suppressing unnecessary ones. Second, we exploit effectively fine-grained features about tiny objects from the shallower layers through modifying backbone Darknet-53 and adding one prediction head to yolo-v3. Finally, a receptive field block is added into the neck of the network to broaden the receptive field. Experiments prove the effectiveness of our network in both quantitative and qualitative aspects. The mAP@0.5 of our network reaches 0.965 and its detection speed is 55.56 FPS for 512 × 512 images on the challenging Tsinghua-Tencent 100k (TT100k) dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Everingham M, van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge. Int J Comput Vis, 2010, 88: 303–338

    Article  Google Scholar 

  2. Zhu Z, Liang D, Zhang S, et al. Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 2110–2118

  3. Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149

    Article  Google Scholar 

  4. Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. In: Proceedings of the European Conference on Computer Vision. Amsterdam, 2016. 21–37

  5. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection. 2020, ArXiv: 2004.10934

  6. Pang G, Shen C, Cao L, et al. Deep learning for anomaly detection: A review. ACM Comput Surv, 2021, 54: 1–38

    Article  Google Scholar 

  7. Lillo-Castellano J M, Mora-Jiménez I, Figuera-Pozuelo C, et al. Traffic sign segmentation and classification using statistical learning methods. Neurocomputing, 2015, 153: 286–299

    Article  Google Scholar 

  8. Loy G, Barnes N. Fast shape-based road sign detection for a driver assistance system. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Sendai, 2004. 70–75

  9. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). San Diego, 2005. 886–893

  10. Lee T S. Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell, 1996, 18: 959–971

    Article  Google Scholar 

  11. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, 2001

  12. Yin X, Liu X. Multi-task convolutional neural network for poseinvariant face recognition. IEEE Trans Image Process, 2018, 27: 964–975

    Article  MathSciNet  Google Scholar 

  13. Cheng M M, Zhang Z, Lin W Y, et al. BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014. 3286–3293

  14. Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 2117–2125

  15. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 8759–8768

  16. Liu S, Huang D, Wang Y. Learning spatial fusion for single-shot object detection. 2019, ArXiv: 1911.09516

  17. Yang T, Long X, Sangaiah A K, et al. Deep detection network for real-life traffic sign in vehicular networks. Comput Netw, 2018, 136: 95–104

    Article  Google Scholar 

  18. Lu Y, Lu J, Zhang S, et al. Traffic signal detection and classification in street views using an attention model. Comp Visual Media, 2018, 4: 253–266

    Article  Google Scholar 

  19. Meng Z, Fan X, Chen X, et al. Detecting small signs from large images. In: Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI). San Diego, 2017. 217–224

  20. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014. 580–587

  21. Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, 2015. 1440–1448

  22. He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904–1916

    Article  Google Scholar 

  23. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 779–788

  24. Redmon J, Farhadi A. Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 7263–7271

  25. Redmon J, Farhadi A. Yolov3: An incremental improvement. 2018, ArXiv: 1804.02767

  26. Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017. 2980–2988

  27. Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, 2019. 6569–6578

  28. Law H, Deng J. Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, 2018. 734–750

  29. Law H, Teng Y, Russakovsky O, et al. Cornernet-lite: Efficient key-point based object detection. 2019, ArXiv: 1904.08900

  30. Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. In: Proceedings of the European Conference on Computer Vision. Zurich, 2014. 740–755

  31. Qiao S, Chen L C, Yuille A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. 2020, ArXiv: 2006.02334

  32. Fu C Y, Liu W, Ranga A, et al. DSSD: Deconvolutional single shot detector. 2017, ArXiv: 1701.06659

  33. Cui L S, Ma R, Lv P, et al. MDSSD: multi-scale deconvolutional single shot detector for small objects. Sci China Inf Sci, 2020, 63: 120113

    Article  Google Scholar 

  34. Li J, Liang X, Wei Y, et al. Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017

  35. Singh B, Davis L S. An analysis of scale invariance in object detection-SNIP. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, 2018

  36. Singh B, Najibi M, Davis L S. SNIPER: Efficient multi-scale training. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, 2018. 9333–9343

  37. Hu H, Gu J, Zhang Z, et al. Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 3588–3597

  38. Hu J, Shen L, Albanie S, et al. Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 2011–2023

    Article  Google Scholar 

  39. Li X, Wang W, Hu X, et al. Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 510–519

  40. Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, 2018. 3–19

  41. Park J, Woo S, Lee J Y, et al. BAM: Bottleneck attention module. In: Proceedings of the British Machine Vision Conference (BMVC) and British Machine Vision Association (BMVA). Newcastle upon Tyne, 2018

  42. Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 7794–7803

  43. Cao Y, Xu J, Lin S, et al. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Seoul, 2019

  44. Tian Y, Gelernter J, Wang X, et al. Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst, 2019, 20: 4466–4475

    Article  Google Scholar 

  45. Cao J, Zhang J, Huang W. Traffic sign detection and recognition using multi-scale fusion and prime sample attention. IEEE Access, 2021, 9: 3579–3591

    Article  Google Scholar 

  46. Zhang H, Cao Z, Yan Z, et al. Sill-net: Feature augmentation with separated illumination representation. 2021, ArXiv: 2102.03539

  47. Rezatofighi H, Tsoi N, Gwak J, et al. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 658–666

  48. Araujo A, Norris W, Sim J. Computing receptive fields of convolutional neural networks. Distill, 2019, 4: 21

    Article  Google Scholar 

  49. Zhang H, Qin L, Li J, et al. Real-time detection method for small traffic signs based on Yolov3. IEEE Access, 2020, 8: 64145–64156

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to TingTing Yang or Chao Tong.

Additional information

This work was supported by the National Key R&D Program of China (Grant Nos. 2018YFB2101100 and 2019YFB2101600), the National Natural Science Foundation of China (Grant No. 62176016), the Guizhou Province Science and Technology Project: Research and Demonstration of Science and Technology Big Data Mining Technology Based on Knowledge Graph (Qiankehe[2021] General 382), the Training Program of the Major Research Plan of the National Natural Science Foundation of China (Grant No. 92046015), and the Beijing Natural Science Foundation Program and Scientific Research Key Program of Beijing Municipal Commission of Education (Grant No. KZ202010025047).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, T., Tong, C. Real-time detection network for tiny traffic sign using multi-scale attention module. Sci. China Technol. Sci. 65, 396–406 (2022). https://doi.org/10.1007/s11431-021-1950-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-021-1950-9

Keywords

Navigation