Skip to main content
Log in

A feature aggregation network for multispectral pedestrian detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Pedestrian detection is an important task in many computer vision applications. Since multispectral pedestrian detection can alleviate the difficulties of insufficient illumination at night, it has been rapidly developed in recent years. However, the way for effective color-thermal image fusion still needs further research. In this paper, we propose a Feature Aggregation Module (FAM) that can adaptively capture the cross-channel and cross-dimension information interaction of the two modalities. In addition, we develop a Feature Aggregation Network (FANet) that embeds the proposed FAM module into a two-stream network adapted from the YOLOv5. FANet has the advantages that its size is small (15 MB) and it runs fast (8 ms per frame). Extensive experiments on the KAIST dataset show that the proposed method is effective for multispectral pedestrian detection, especially in the night-time condition, for which the Miss Rate is only 8.91%. Moreover, we show that the saliency map computed from the thermal image can be incorporated into FANet to further improve the detection accuracy. In order to verify the generalization ability of the FAM module, we have also conducted experiments on the person re-identification datasets, namely Market1501 and Duke. The performance of our FAM compares favorably against existing feature fusion mechanisms on the two datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availibility Statement

\(\bullet \) The KAIST-Common dataset that supports the findings of this study is available from https://github.com/mrkieumy/task-conditioned;

\(\bullet \) The KAIST-Saliency dataset that supports the findings of this study is available from https://github.com/Information-Fusion-Lab-Umass/Salient-Pedestrian-Detection;

\(\bullet \) The Market1501 dataset that supports the findings of this study is available from https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html;

\(\bullet \) The Duke dataset that supports the findings of this study is available from https://exposing.ai/duke_mtmc/

Notes

  1. https://github.com/Information-Fusion-Lab-Umass/Salient-Pedestrian-Detection

References

  1. Ha Q, Watanabe K, Karasawa T, Ushiku Y, Harada (2017) Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5108–5115

  2. Hwang S, Park J, Kim N, Choi Y, Kweon IS (2013) Multispectral pedestrian detection: Benchmark dataset and baseline. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  3. Wagner J, Fischer V, Herman M, Behnke S (2016) Multispectral pedestrian detection using deep fusion convolutional neural networks. In: 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)

  4. Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644

  5. Choi H, Kim S, Park, K., Sohn, K (2016) Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: International Conference on Pattern Recognition (ICPR)

  6. Zhou K, Chen L, Cao X (2020) Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Proceedings of European Conference on Computer Vision (ECCV), Springer, pp 787–803

  7. Choi E-J, Park D-J (2010) Human detection using image fusion of thermal and visible image with new joint bilateral filter. In: 5th International Conference on Computer Sciences and Convergence Information Technology, IEEE, pp 882–885

  8. Li C, Song D, Tong R, Tang M (2019) Illumination-aware Faster R-CNN for robust multispectral pedestrian detection. Pattern Recognition 85:161–171

    Article  Google Scholar 

  9. Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2018) Cross-modality interactive attention network for multispectral pedestrian detection. Information Fusion 50:20–29

    Article  Google Scholar 

  10. Li C, Song D, Tong R, Tang M (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. In: British Machine Vision Conference (BMVC)

  11. Cao Y, Guan D, Wu Y, Yang J, Cao Y, Yang MY (2019) Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS Journal of Photogrammetry and Remote Sensing 150:70–79

    Article  Google Scholar 

  12. Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion 50:148–157

    Article  Google Scholar 

  13. Jocher G (2020) YOLOv5. GitHub

  14. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1116–1124

  15. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)

  16. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshop on Benchmarking Multi-Target Tracking

  17. Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(8):1532–1545

    Article  Google Scholar 

  18. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, IEEE, pp 886–893

  19. Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3626–3633

  20. Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 3258–3265

  21. Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast rcnn for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996

    Google Scholar 

  22. Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4950–4959

  23. Wang X, Xiao T, Jiang Y, Shuai S, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  24. Luo Y, Zhang C, Zhao M, Zhou H, Sun J (2020) Where, what, whether: Multi-modal learning meets pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14065–14073

  25. Konig D, Adam M, Jarvers C, Layher G, Neumann H, Teutsch M (2017) Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 49–56

  26. Zhang L, Zhu X, Chen X, Yang X, Lei Z, Liu Z (2019) Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 5127–5137

  27. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  28. You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4651–4659

  29. Jie H, Li S, Gang S, Albanie S (2020) Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(8):2011–2023

    Article  Google Scholar 

  30. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  31. Qin Z, Zhang P, Wu F, Li X (2021) FcaNet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 783–792

  32. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  33. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 3139–3148

  34. Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 390–391

  35. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125

  36. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8759–8768

  37. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 658–666

  38. Deng Z, Hu X, Zhu L, Xu X, Qin J, Han G, Heng P-A (2018) R3net: Recurrent residual refinement network for saliency detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press, pp 684–690

  39. Ghose D, Desai SM, Bhattacharya S, Chakraborty D, Fiterau M, Rahman T (2019) Pedestrian detection in thermal images using saliency maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  40. Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0

  41. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4):743–761

    Article  Google Scholar 

  42. Kieu M, Bagdanov AD, Bertini M, Del Bimbo A (2020) Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Proceedings of European Conference on Computer Vision (ECCV), Springer, pp 546–562

  43. Xu D, Ouyang W, Ricci E, Wang X, Sebe N (2017) Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5363–5371

  44. Guo T, Huynh CP, Solh M (2019) Domain-adaptive pedestrian detection in thermal images. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1660–1664

  45. Kieu M, Bagdanov AD, Bertini M, Del Bimbo A (2019) Domain adaptation for privacy-preserving pedestrian detection in thermal imagery. In: International Conference on Image Analysis and Processing (ICIAP), Springer, pp 203–213

  46. Vandersteegen M, Van Beeck K, Goedemé T (2018) Real-time multispectral pedestrian detection with a single-pass deep neural network. In: International Conference Image Analysis and Recognition (ICIAP), Springer, pp 419–426

  47. Zheng Y, Izzat IH, Ziaee S (2019) Gfd-ssd: gated fusion double ssd for multispectral pedestrian detection. arXiv preprint arXiv:1903.06999

  48. Sun Y, Zheng L, Deng W, Wang S (2017) SVDNet for pedestrian retrieval. 2017 IEEE International Conference on Computer Vision (ICCV)

  49. Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6036–6046

  50. Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: Surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184

  51. Fan X, Luo H, Zhang X, He L, Zhang C, Jiang W (2018) Scpnet: Spatial-channel parallelism network for joint holistic and partial person re-identification. In: Asian Conference on Computer Vision (ACCV), Springer, pp 19–34

  52. Zhong Z, Zheng L, Zheng Z, Li S, Yang Y (2018) Camstyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Processing (ICIP) 28(3):1176–1190

    Article  MathSciNet  Google Scholar 

  53. Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang Y-G, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 650–667

  54. Si J, Zhang H, Li C-G, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5363–5372

  55. Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2285–2294

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Wang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, Y., Wang, L. & Xu, L. A feature aggregation network for multispectral pedestrian detection. Appl Intell 53, 22117–22131 (2023). https://doi.org/10.1007/s10489-023-04628-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04628-y

Keywords

Navigation