Skip to main content
Log in

RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm:
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability statement

Not applicable.

Abbreviations

BN:

Batch normalization-2D layer

CBS:

Convolution batch normalization SiLU

CNN:

Convolutional neural network

DL:

Deep learning

FN:

False negative

FP:

False positive

FPN:

Feature pyramid network

FPS:

Frames per second

IoU:

Intersection over union

ML:

Machine learning

RCN-N:

Region-CNN

RNN:

Recurrent neural network

SSD:

Single-shot detector

TN:

True negative

TP:

True positive

YOLO:

You only look once

References

  1. Lee, Y., Chan, Y., Fu, L.: Near-infrared-based nighttime pedestrian detection using grouped part models. IEEE Trans. Intell. Transport. Syst. 16, 1929–1940 (2018)

    Article  Google Scholar 

  2. Morgan, F., Hurney, P., Glavin, M.: Review of pedestrian detection techniques in automotive far-infrared video. IET Intell. Transport. Syst. 9, 824–832 (2015). https://doi.org/10.1049/iet-its.2014.0236

    Article  Google Scholar 

  3. Fearghal, M., Patrick, H., Martin, G., Edward, J.: Review of pedestrian detection techniques in automotive far-infrared video. IET Intell. Transport. Syst. 8, 824–832 (2015). https://doi.org/10.1049/iet-its.2014.0236

    Article  Google Scholar 

  4. Alonso, I.P., Llorca, D.F., Sotelo, M.A.: Combination of feature extraction methods for SVM pedestrian detection. IEEE Trans. Intell. Transport. Syst. 8(2), 292–307 (2007)

    Article  Google Scholar 

  5. O’Malley, R., Jones, E., Glavin, M.: Detection of pedestrians in far-infrared automotive night vision using region-growing and clothing distortion compensation. Infrared Phys. Technol. 53, 439–449 (2010)

    Article  Google Scholar 

  6. Guo, L., Ge, P.S., Zhang, M.H.: Pedestrian detection for intelligent transportation systems combining AdaBoost algorithm and support vector machine. Expert Syst. Appl. 39, 4274–4286 (2012). https://doi.org/10.1016/j.eswa.2011.09.106

    Article  Google Scholar 

  7. Begard, J., Allezard, N., Sayd, P.: Real-time human detection in urban scenes: local descriptors and classifiers selection with AdaBoost-like algorithms. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008)

  8. Haider, A., Shaukat, F., Mir, J.: Human detection in aerial thermal imaging using a fully convolutional regression network. Infrared Phys. Technol. 116, 103796 (2021)

    Article  Google Scholar 

  9. Dai, X., Hu, J., Zhang, H., Shitu, A., Luo, C., Osman, A., Sfarra, S., Duan, Y.: Multi-task faster R-CNN for nighttime pedestrian detection and distance estimation. Infrared Phys. Technol. 115, 103694 (2021)

    Article  Google Scholar 

  10. Hong, F., Lu, C.H.M., Wang, T., Jiang, W.W.: Improved SSD model for pedestrian detection in natural scene. Wireless Commun. Mobile Comput. (2022). https://doi.org/10.1155/2022/1500428

    Article  Google Scholar 

  11. Xue, Y., Ju, Z., Li, Y., Zhang, W.: MAF-YOLO: Multi-modal attention fusion based YOLO for pedestrian detection. Infrared Phys. Technol. 118, 103906 (2021)

    Article  Google Scholar 

  12. Hao, S., Gao, S., Ma, X.: Anchor-free infrared pedestrian detection based on cross-scale feature fusion and hierarchical attention mechanism. Infrared Phys. Technol. 131, 104660 (2023). https://doi.org/10.1016/j.infrared.2023.104660

    Article  Google Scholar 

  13. Woo, S., Park, J., Lee, J.Y.: CBAM: convolutional block attention module. In: European Conference on Computer Vision, pp. 3–19 (2018). http://arxiv.org/abs/1807.06521v2

  14. Fang, W., Han, X.: Spatial and channel attention modulated network for medical image segmentation. In: Asian Conference on Computer Vision, pp. 3–17 (2020). https://doi.org/10.1007/978-3-030-69756-31

  15. Oren, M., Papageorgiou, C., Sinha, P, et al.: Pedestrian detection using wavelet templates. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 193–199 (2008)

  16. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, California, USA, pp. 886–893 (2005)

  17. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)

    Article  Google Scholar 

  18. Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  19. Liu, Y., Su, H., Zeng, C., Li, X.: A robust thermal infrared vehicle and pedestrian detection method in complex scenes. Sensors 21, 1240 (2021)

    Article  Google Scholar 

  20. Zhou, L., Gao, S., Wang, S.M.: IPD-Net: infrared pedestrian detection network via adaptive feature extraction and coordinate information fusion. Sensors 22, 899–8966 (2022)

    Article  Google Scholar 

  21. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022). https://doi.org/10.48550/arXiv.2207.02696

  22. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 7132–7141 (2018)

  23. Hou, Q.B., Zhou, D.Q., Feng, J.S.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021). https://doi.org/10.48550/arXiv.2103.02907

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)

    Article  Google Scholar 

  25. Liu, W., Anguelov, D., Erhan, D.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-02

  26. Duan, K., Bai, S., Xie, L.: CenterNet: keypoint triplets for object detection (2019)

  27. Tan, M.X., Pang, R.M., Le, Q.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020). https://doi.org/10.48550/arXiv.1911.09070

  28. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv: 1804.02767. https://arxiv.org/abs/1804.02767

  29. Bochkovskiy, A., Wang, C.Y., Liao, W.Y.M.: Yolov4: optimal speed and accuracy of object detection, p. 10934 (2020). https://arxiv.org/abs/2004.10934

  30. Ge, Z., Song, S.T., Fang, W.: YOLOX: exceeding YOLO series in 2021 (2021). https://arxiv.org/abs/2107.08430

Download references

Funding

This research was funded by the National Natural Science Foundation of China (51804250); China Postdoctoral Science Foundation (2020M683522); and Natural Science Basic Research Program of Shaanxi (2024JC-YBMS-490).

Author information

Authors and Affiliations

Authors

Contributions

Innovation, SH; proposed method, QZL; experiment, QZL; data collation and analysis, XM and HJL; data verification, QZL and QYW; investigation, TH; resources, QZL; data curation, QZL; preparing the initial draft of a written work, SH, and QZL; revision and proofreading of written work, SH and QZL; project administration, SH; Fund support, SH.

Corresponding author

Correspondence to Xu Ma.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao, S., Liu, Z., Ma, X. et al. RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image. J Real-Time Image Proc 21, 89 (2024). https://doi.org/10.1007/s11554-024-01469-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01469-x

Keywords

Navigation