MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

Wang, Tingting; Wan, Liang; Tang, Lu; Liu, Mingsheng

doi:10.1007/s10489-021-03061-3

MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

Published: 14 March 2022

Volume 52, pages 15308–15324, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tingting Wang¹,
Liang Wan ORCID: orcid.org/0000-0002-7677-8471¹,
Lu Tang¹ &
…
Mingsheng Liu¹

743 Accesses
2 Citations
Explore all metrics

Abstract

To solve the problem of numerous deep convolutions in YOLOv4, which generates many redundant background features so that it cannot focus on pedestrians at a specific scale, we propose a method named MGA-YOLOv4 (Mask-Guided Attention YOLOv4) that can dynamically select the most crucial features from a cluttered background. First, we design a semantic segmentation encode-decode network to generate a fine-grained pixel-level mask that is used to serve as a weakly supervised signal in each detection branch. Second, we build a mask-guided attention module by producing attention weights of the channel dimension and space dimension and then encode them into the mask to highlight pedestrians of a specific scale and avoid background interference. To prove the effectiveness of MGA, we demonstrate the network attention map and design ablation experiments. The results show that the miss rate of the proposed method combined with the channel concatenate space decreased by 1.82% compared with the original YOLOv4. Comparison experiment results on five challenging pedestrian detection datasets show that our method achieves very competitive performance with the state-of-the-art methods and reaches a favourable trade-off between speed and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

Article 25 January 2023

A Hybrid Self-Attention Model for Pedestrians Detection

FE-CSP: a fast and efficient pedestrian detector with center and scale prediction

Article 21 September 2022

References

Qichang H, Wang P, Shen C, van den Hengel A, Porikli F (2017) Pushing the limits of deep cnns for pedestrian detection. IEEE Trans Circ Syst Video Technol 28(6):1358–1368
Google Scholar
Du X, El-Khamy M, Lee J, Davis L (2017) Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 953–961
Chapter Google Scholar
Zhou C, Yuan J (2017) Multi-label learning of part detectors for heavily occluded pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3486–3495
Google Scholar
Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4950–4959
Ouyang W, Zhou H, Li H, Li Q, Yan J, Wang X (2017) Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 40(8):1874–1887
Article Google Scholar
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection? In: Proceedings of the iEEE conference on computer vision and pattern recognition, pp 1259–1267
Google Scholar
Jiang X, Zhang L, Zhang T, Lv P, Zhou B, Pang Y, Mingliang X, Changsheng X (2020) Density-aware multi-task learning for crowd counting. IEEE Trans Multimedia 23:443–453
Article Google Scholar
Cao J, Pang Y, Han J, Gao B, Li X (2019) Taking a look at small-scale pedestrians and occluded pedestrians. IEEE Trans Image Process 29:3143–3152
Article MATH Google Scholar
Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33
Article MATH Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. Ieee, pp 886–893
Google Scholar
Yan J, Lei Z, Wen L, Li SZ (2014) The fastest deformable part model for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2497–2504
Google Scholar
Ren S, He K, Girshick R, Sun J (2017) Faster rcnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, realtime object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Google Scholar
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Google Scholar
Bochkovskiy A, Wang C-Y, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3127–3136
Google Scholar
He K, Gkioxari G, Doll’ar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Google Scholar
Cao J, Cholakkal H, Anwer RM, Khan FS, Pang Y, Shao L (2020) D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11485–11494
Google Scholar
Jiang X, Zhang L, Xu M, Zhang T, Lv P, Zhou B, Yang X, Pang Y (2020) Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4706–4715
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-andexcitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Google Scholar
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X (2017) and Xiaoou tang. Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Google Scholar
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 6995–7003
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Google Scholar
Liu S, Huang D et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 385–400
Google Scholar
Komodakis N, Zagoruyko S (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In ICLR
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Google Scholar
Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN (2018) Gradcam++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 839–847
Chapter Google Scholar
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12993–13000
Google Scholar
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3213–3221
Google Scholar
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: Proc.conf.On computer vision pattern recognition, pp 304–311
Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The Kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Chapter Google Scholar
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd. In: Proceedings of the iEEE conference on computer vision and pattern recognition
Google Scholar
Milan A, Leal-Taix’e L, Reid I, Roth S, Schindler K (2016) Mot16: a benchmark for multi-object tracking. In: Proceedings of the iEEE conference on computer vision and pattern recognition
Google Scholar
Wojek C, Dollar P, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743
Article Google Scholar
Liu W, Liao S, Ren W, Weidong H, Yinan Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5187–5196
Google Scholar
Liu T, Luo W, Ma L, Huang J-J, Stathaki T, Dai T (2020) Coupled network for robust pedestrian detection with gated multi-layer feature extraction and deformable occlusion handling. IEEE Trans Image Process 30:754–766
Article Google Scholar
Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circ Syst Video Technol 30(10):3372–3386
Article Google Scholar
Li J, Liang X, Shen SM, Tingfa X, Feng J, Yan S (2017) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimedia 20(4):985–996
Google Scholar
Hsu WY, Lin WY (2021) Ratio-and-scale-aware yolo for pedestrian detection. IEEE Trans Image Process 30:934–947
Article Google Scholar
Ma J, Wan H, Wang J, Xia H, Bai C (2021) An improved one-stage pedestrian detection method based on multi-scale attention feature extraction. J Real-Time Image Proc:1–14. Springer, Berlin
Lee Y, Hwang H, Shin J, Oh BT (2020) Pedestrian detection using multiscale squeeze-and-excitation module. Mach Vis Appl 31(6):1–9
Article Google Scholar
Zhang S, Yang X, Liu Y, Changsheng X (2020) Asymmetric multi-stage cnns for small-scale pedestrian detection. Neurocomputing 409:12–26
Article Google Scholar
Zhu Y, Sun W, Cao X, Wang C, Dongyang W, Yang Y, Ye N (2019) Ta-cnn: two-way attention models in deep convolutional neural network for plant recognition. Neurocomputing 365:191–200
Article Google Scholar
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 3361–3369
Google Scholar
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European conference on computer vision. Springer, pp 443–457
Google Scholar
Lin C, Jiwen L, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: Proceedings of the European conference on computer vision >(ECCV), pp 732–747
Yu X, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 924–933
Google Scholar
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515
Google Scholar
Lin T-Y , Doll’ar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Google Scholar
Cai Z, Vasconcelos N (2019) Cascade r-cnn: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell:1–12. IEEE
Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: one proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12214–12223
Google Scholar
Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137
Google Scholar
Fengwei Y, Li W, Li Q, Liu Y, Shi X, Yan J (2016) Poi: multiple object tracking with high performance detection and appearance feature. In: European Conference on Computer Vision. Springer, pp 36–42
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Software and Theory, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China
Tingting Wang, Liang Wan, Lu Tang & Mingsheng Liu

Authors

Tingting Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wan
View author publications
You can also search for this author in PubMed Google Scholar
Lu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Mingsheng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Wan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Wan, L., Tang, L. et al. MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention. Appl Intell 52, 15308–15324 (2022). https://doi.org/10.1007/s10489-021-03061-3

Download citation

Accepted: 30 November 2021
Published: 14 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10489-021-03061-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

Abstract

Access this article

Similar content being viewed by others

SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

A Hybrid Self-Attention Model for Pedestrians Detection

FE-CSP: a fast and efficient pedestrian detector with center and scale prediction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

Abstract

Access this article

Similar content being viewed by others

SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

A Hybrid Self-Attention Model for Pedestrians Detection

FE-CSP: a fast and efficient pedestrian detector with center and scale prediction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation