SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

Liu, Mingsheng; Wan, Liang; Wang, Bo; Wang, Tingting

doi:10.1007/s10489-023-04456-0

SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

Published: 25 January 2023

Volume 53, pages 18171–18188, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Mingsheng Liu¹,
Liang Wan ORCID: orcid.org/0000-0002-7677-8471¹,
Bo Wang¹ &
…
Tingting Wang¹

809 Accesses
7 Citations
Explore all metrics

Abstract

In pedestrian detection, the upsampling operation of YOLOv4 during feature aggregation affects the integrity of feature information for small-scale and occluded targets. To address this issue, we propose a pedestrian detection model named Shuffle Expansion YOLOv4 (SE-YOLOv4) composed of a path aggregation network based on PixelShuffle (Shuffle-PANet) and an efficient pyramid atrous convolutional block attention module (EPA-CBAM), to improve the detection performance of small-scale and occluded pedestrian targets. First, we propose a feature aggregation network Shuffle-PANet based on PixelShuffle to maintain the feature information integrity of small-scale and occluded targets by expanding high-resolution feature maps through convolutions and interchannel periodic shuffling instead of linear interpolation-based upsampling. Then, we propose EPA-CBAM, whose channel attention module (EPA-CAM) can build a pyramid structure and obtain fine-grained multiscale spatial information in different channels by dilated convolutions of corresponding sizes. The results show that the miss rate of SE-YOLOv4 decreased by 3.54% compared with that of the vanilla YOLOv4 on the CityPersons dataset. Comparison experiment results on four challenging pedestrian detection datasets show that our method achieves very competitive performance and maintains a reasonable balance between accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

Article 14 March 2022

AFC-Net: adjacent feature complementary for crowded pedestrian detection

Article 18 August 2023

Improved YOLOv5 Algorithm for Intensive Pedestrian Detection

References

Combs TS, Sandt LS, Clamann MP, McDonald NC (2019) Automated vehicles and pedestrian safety: exploring the promise and limits of pedestrian detection. American J Preventive Med 56(1):1–7
Article Google Scholar
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell 34(4):743–761
Article Google Scholar
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3221
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). Ieee, vol 1, pp 886–893
Li J, Liang X, Shen SM, Xu T, Feng J, Yan S (2017) Scale-aware fast r-cnn for pedestrian detection. EEE Trans Multimed 20(4):985–996
Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Wieczorek M, Siłka J, Woźniak M, Garg S, Hassan MM (2021) Lightweight convolutional neural network model for human face detection in risk situations. IEEE Trans Indust Inf 18(7):4820–4829
Article Google Scholar
Woźniak M, Siłka J, Wieczorek M (2021) Deep neural network correlation learning mechanism for ct brain tumor detection. Neural Comput Appl:1–16
Wang T, Wan L, Tang L, Liu M (2022) Mga-yolov4: a multi-scale pedestrian detection method based on mask-guided attention. Appl Intell:1–17
Cao J, Qi C, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. arXiv:2005.11475
Songtao Liu, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. arXiv:1911.09516
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Liu S, Chen P, Woźniak M (2022) Image enhancement-based detection with small infrared targets. Remote Sensing 14(13):3232
Article Google Scholar
Shi W, Caballero J, Huszár F, Totz J, Aitken AP, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1874–1883
Jin Y, Zhang Y, Cen Y, Li Y, Mladenovic V, Voronin V (2021) Pedestrian detection with super-resolution reconstruction for low-quality image. Pattern Recognit 115:107846
Article Google Scholar
Zhao X, Li W, Zhang Y, Feng Z (2018) Residual super-resolution single shot network for low-resolution object detection. IEEE Access 6:47780–47793
Article Google Scholar
Hu J, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Zhang H, Zu K, Lu J, Zou Y, Meng D (2021) Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv:2105.14447
Osendorfer C, Soyer H, Smagt PVD (2014) Image super-resolution with fast approximate convolutional sparse coding. In: International conference on neural information processing. Springer, pp 250–257
Wagner J, Fischer V, Herman M, Behnke S et al (2016) Multispectral pedestrian detection using deep fusion convolutional neural networks. In: ESANN, vol 587, pp 509–514
Alexander N, Gool LV (2006) Efficient non-maximum suppression. In: 18th International conference on pattern recognition (ICPR’06). IEEE, vol 3, pp 850–855
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12993–13000
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Chen X, Kundu K, Zhu Y, Berneshawi AG, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. Adv Neural Inf Process Syst, vol 28
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision. Springer, pp 443–457
Tesema FB, Wu H, Chen M, Lin J, Zhu W, Huang K (2020) Hybrid channel based pedestrian detection. Neurocomputing 389:1–8
Article Google Scholar
Ma J, Wan H, Wang J, Xia H, Bai C (2021) An improved one-stage pedestrian detection method based on multi-scale attention feature extraction. J Real-Time Image Process 18(6):1965– 1978
Article Google Scholar
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5187–5196
Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30(10):3372–3386
Article Google Scholar
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087
Liu T, Luo W, Ma L, Huang J-J, Stathaki T, Dai T (2020) Coupled network for robust pedestrian detection with gated multi-layer feature extraction and deformable occlusion handling. IEEE Trans Image Process 30:754–766
Article Google Scholar
Hsu W-Y, Lin W-Y (2020) Ratio-and-scale-aware yolo for pedestrian detection. IEEE Trans Image Process 30:934–947
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 3361–3369
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912
Brazil G, Xi Y, Liu X (2017) Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4950–4959
Hu Q, Wang P, Shen C, Hengel AVD, Porikli F (2017) Pushing the limits of deep cnns for pedestrian detection. IEEE Trans Circuits Syst Video Technol 28(6):1358–1368
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst, vol 28
Lin C, Lu J, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: Proceedings of the European conference on computer vision (ECCV), pp 732–747
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Inproceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Pedhunter XZ (2020) Occlusion robust pedestrian detector in crowded scenes. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 10639–10646
Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: one proposal, multiple predictions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12214–12223

Download references

Author information

Authors and Affiliations

State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China
Mingsheng Liu, Liang Wan, Bo Wang & Tingting Wang

Authors

Mingsheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wan
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Wan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, M., Wan, L., Wang, B. et al. SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle. Appl Intell 53, 18171–18188 (2023). https://doi.org/10.1007/s10489-023-04456-0

Download citation

Accepted: 04 January 2023
Published: 25 January 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-023-04456-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

Abstract

Access this article

Similar content being viewed by others

MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

AFC-Net: adjacent feature complementary for crowded pedestrian detection

Improved YOLOv5 Algorithm for Intensive Pedestrian Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle

Abstract

Access this article

Similar content being viewed by others

MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention

AFC-Net: adjacent feature complementary for crowded pedestrian detection

Improved YOLOv5 Algorithm for Intensive Pedestrian Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation