Abstract
As one of the key technologies of intelligent vehicles, traffic sign detection is still a challenging task because of the tiny size of its target object. To address the challenge, we present a novel detection network improved from yolo-v3 for the tiny traffic sign with high precision in real-time. First, a visual multi-scale attention module (MSAM), a light-weight yet effective module, is devised to fuse the multi-scale feature maps with channel weights and spatial masks. It increases the representation power of the network by emphasizing useful features and suppressing unnecessary ones. Second, we exploit effectively fine-grained features about tiny objects from the shallower layers through modifying backbone Darknet-53 and adding one prediction head to yolo-v3. Finally, a receptive field block is added into the neck of the network to broaden the receptive field. Experiments prove the effectiveness of our network in both quantitative and qualitative aspects. The mAP@0.5 of our network reaches 0.965 and its detection speed is 55.56 FPS for 512 × 512 images on the challenging Tsinghua-Tencent 100k (TT100k) dataset.
Similar content being viewed by others
References
Everingham M, van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge. Int J Comput Vis, 2010, 88: 303–338
Zhu Z, Liang D, Zhang S, et al. Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 2110–2118
Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149
Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. In: Proceedings of the European Conference on Computer Vision. Amsterdam, 2016. 21–37
Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection. 2020, ArXiv: 2004.10934
Pang G, Shen C, Cao L, et al. Deep learning for anomaly detection: A review. ACM Comput Surv, 2021, 54: 1–38
Lillo-Castellano J M, Mora-Jiménez I, Figuera-Pozuelo C, et al. Traffic sign segmentation and classification using statistical learning methods. Neurocomputing, 2015, 153: 286–299
Loy G, Barnes N. Fast shape-based road sign detection for a driver assistance system. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Sendai, 2004. 70–75
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). San Diego, 2005. 886–893
Lee T S. Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell, 1996, 18: 959–971
Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, 2001
Yin X, Liu X. Multi-task convolutional neural network for poseinvariant face recognition. IEEE Trans Image Process, 2018, 27: 964–975
Cheng M M, Zhang Z, Lin W Y, et al. BING: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014. 3286–3293
Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 2117–2125
Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 8759–8768
Liu S, Huang D, Wang Y. Learning spatial fusion for single-shot object detection. 2019, ArXiv: 1911.09516
Yang T, Long X, Sangaiah A K, et al. Deep detection network for real-life traffic sign in vehicular networks. Comput Netw, 2018, 136: 95–104
Lu Y, Lu J, Zhang S, et al. Traffic signal detection and classification in street views using an attention model. Comp Visual Media, 2018, 4: 253–266
Meng Z, Fan X, Chen X, et al. Detecting small signs from large images. In: Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI). San Diego, 2017. 217–224
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014. 580–587
Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, 2015. 1440–1448
He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904–1916
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016. 779–788
Redmon J, Farhadi A. Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017. 7263–7271
Redmon J, Farhadi A. Yolov3: An incremental improvement. 2018, ArXiv: 1804.02767
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017. 2980–2988
Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, 2019. 6569–6578
Law H, Deng J. Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, 2018. 734–750
Law H, Teng Y, Russakovsky O, et al. Cornernet-lite: Efficient key-point based object detection. 2019, ArXiv: 1904.08900
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. In: Proceedings of the European Conference on Computer Vision. Zurich, 2014. 740–755
Qiao S, Chen L C, Yuille A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. 2020, ArXiv: 2006.02334
Fu C Y, Liu W, Ranga A, et al. DSSD: Deconvolutional single shot detector. 2017, ArXiv: 1701.06659
Cui L S, Ma R, Lv P, et al. MDSSD: multi-scale deconvolutional single shot detector for small objects. Sci China Inf Sci, 2020, 63: 120113
Li J, Liang X, Wei Y, et al. Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017
Singh B, Davis L S. An analysis of scale invariance in object detection-SNIP. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, 2018
Singh B, Najibi M, Davis L S. SNIPER: Efficient multi-scale training. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, 2018. 9333–9343
Hu H, Gu J, Zhang Z, et al. Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 3588–3597
Hu J, Shen L, Albanie S, et al. Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 2011–2023
Li X, Wang W, Hu X, et al. Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 510–519
Woo S, Park J, Lee J Y, et al. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, 2018. 3–19
Park J, Woo S, Lee J Y, et al. BAM: Bottleneck attention module. In: Proceedings of the British Machine Vision Conference (BMVC) and British Machine Vision Association (BMVA). Newcastle upon Tyne, 2018
Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018. 7794–7803
Cao Y, Xu J, Lin S, et al. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Seoul, 2019
Tian Y, Gelernter J, Wang X, et al. Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst, 2019, 20: 4466–4475
Cao J, Zhang J, Huang W. Traffic sign detection and recognition using multi-scale fusion and prime sample attention. IEEE Access, 2021, 9: 3579–3591
Zhang H, Cao Z, Yan Z, et al. Sill-net: Feature augmentation with separated illumination representation. 2021, ArXiv: 2102.03539
Rezatofighi H, Tsoi N, Gwak J, et al. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019. 658–666
Araujo A, Norris W, Sim J. Computing receptive fields of convolutional neural networks. Distill, 2019, 4: 21
Zhang H, Qin L, Li J, et al. Real-time detection method for small traffic signs based on Yolov3. IEEE Access, 2020, 8: 64145–64156
Author information
Authors and Affiliations
Corresponding authors
Additional information
This work was supported by the National Key R&D Program of China (Grant Nos. 2018YFB2101100 and 2019YFB2101600), the National Natural Science Foundation of China (Grant No. 62176016), the Guizhou Province Science and Technology Project: Research and Demonstration of Science and Technology Big Data Mining Technology Based on Knowledge Graph (Qiankehe[2021] General 382), the Training Program of the Major Research Plan of the National Natural Science Foundation of China (Grant No. 92046015), and the Beijing Natural Science Foundation Program and Scientific Research Key Program of Beijing Municipal Commission of Education (Grant No. KZ202010025047).
Rights and permissions
About this article
Cite this article
Yang, T., Tong, C. Real-time detection network for tiny traffic sign using multi-scale attention module. Sci. China Technol. Sci. 65, 396–406 (2022). https://doi.org/10.1007/s11431-021-1950-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-021-1950-9