Abstract
Pedestrian detection is an important task in many computer vision applications. Since multispectral pedestrian detection can alleviate the difficulties of insufficient illumination at night, it has been rapidly developed in recent years. However, the way for effective color-thermal image fusion still needs further research. In this paper, we propose a Feature Aggregation Module (FAM) that can adaptively capture the cross-channel and cross-dimension information interaction of the two modalities. In addition, we develop a Feature Aggregation Network (FANet) that embeds the proposed FAM module into a two-stream network adapted from the YOLOv5. FANet has the advantages that its size is small (15 MB) and it runs fast (8 ms per frame). Extensive experiments on the KAIST dataset show that the proposed method is effective for multispectral pedestrian detection, especially in the night-time condition, for which the Miss Rate is only 8.91%. Moreover, we show that the saliency map computed from the thermal image can be incorporated into FANet to further improve the detection accuracy. In order to verify the generalization ability of the FAM module, we have also conducted experiments on the person re-identification datasets, namely Market1501 and Duke. The performance of our FAM compares favorably against existing feature fusion mechanisms on the two datasets.
Similar content being viewed by others
Data Availibility Statement
\(\bullet \) The KAIST-Common dataset that supports the findings of this study is available from https://github.com/mrkieumy/task-conditioned;
\(\bullet \) The KAIST-Saliency dataset that supports the findings of this study is available from https://github.com/Information-Fusion-Lab-Umass/Salient-Pedestrian-Detection;
\(\bullet \) The Market1501 dataset that supports the findings of this study is available from https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html;
\(\bullet \) The Duke dataset that supports the findings of this study is available from https://exposing.ai/duke_mtmc/
References
Ha Q, Watanabe K, Karasawa T, Ushiku Y, Harada (2017) Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5108–5115
Hwang S, Park J, Kim N, Choi Y, Kweon IS (2013) Multispectral pedestrian detection: Benchmark dataset and baseline. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Wagner J, Fischer V, Herman M, Behnke S (2016) Multispectral pedestrian detection using deep fusion convolutional neural networks. In: 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644
Choi H, Kim S, Park, K., Sohn, K (2016) Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: International Conference on Pattern Recognition (ICPR)
Zhou K, Chen L, Cao X (2020) Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Proceedings of European Conference on Computer Vision (ECCV), Springer, pp 787–803
Choi E-J, Park D-J (2010) Human detection using image fusion of thermal and visible image with new joint bilateral filter. In: 5th International Conference on Computer Sciences and Convergence Information Technology, IEEE, pp 882–885
Li C, Song D, Tong R, Tang M (2019) Illumination-aware Faster R-CNN for robust multispectral pedestrian detection. Pattern Recognition 85:161–171
Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2018) Cross-modality interactive attention network for multispectral pedestrian detection. Information Fusion 50:20–29
Li C, Song D, Tong R, Tang M (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. In: British Machine Vision Conference (BMVC)
Cao Y, Guan D, Wu Y, Yang J, Cao Y, Yang MY (2019) Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS Journal of Photogrammetry and Remote Sensing 150:70–79
Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion 50:148–157
Jocher G (2020) YOLOv5. GitHub
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1116–1124
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on Computer Vision Workshop on Benchmarking Multi-Target Tracking
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(8):1532–1545
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, IEEE, pp 886–893
Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3626–3633
Ouyang W, Wang X (2012) A discriminative deep model for pedestrian detection with occlusion handling. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 3258–3265
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast rcnn for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996
Brazil G, Yin X, Liu X (2017) Illuminating pedestrians via simultaneous detection & segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 4950–4959
Wang X, Xiao T, Jiang Y, Shuai S, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Luo Y, Zhang C, Zhao M, Zhou H, Sun J (2020) Where, what, whether: Multi-modal learning meets pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14065–14073
Konig D, Adam M, Jarvers C, Layher G, Neumann H, Teutsch M (2017) Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 49–56
Zhang L, Zhu X, Chen X, Yang X, Lei Z, Liu Z (2019) Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 5127–5137
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4651–4659
Jie H, Li S, Gang S, Albanie S (2020) Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(8):2011–2023
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Qin Z, Zhang P, Wu F, Li X (2021) FcaNet: Frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 783–792
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 3139–3148
Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 390–391
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8759–8768
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 658–666
Deng Z, Hu X, Zhu L, Xu X, Qin J, Han G, Heng P-A (2018) R3net: Recurrent residual refinement network for saliency detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press, pp 684–690
Ghose D, Desai SM, Bhattacharya S, Chakraborty D, Fiterau M, Rahman T (2019) Pedestrian detection in thermal images using saliency maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 0–0
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4):743–761
Kieu M, Bagdanov AD, Bertini M, Del Bimbo A (2020) Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Proceedings of European Conference on Computer Vision (ECCV), Springer, pp 546–562
Xu D, Ouyang W, Ricci E, Wang X, Sebe N (2017) Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5363–5371
Guo T, Huynh CP, Solh M (2019) Domain-adaptive pedestrian detection in thermal images. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1660–1664
Kieu M, Bagdanov AD, Bertini M, Del Bimbo A (2019) Domain adaptation for privacy-preserving pedestrian detection in thermal imagery. In: International Conference on Image Analysis and Processing (ICIAP), Springer, pp 203–213
Vandersteegen M, Van Beeck K, Goedemé T (2018) Real-time multispectral pedestrian detection with a single-pass deep neural network. In: International Conference Image Analysis and Recognition (ICIAP), Springer, pp 419–426
Zheng Y, Izzat IH, Ziaee S (2019) Gfd-ssd: gated fusion double ssd for multispectral pedestrian detection. arXiv preprint arXiv:1903.06999
Sun Y, Zheng L, Deng W, Wang S (2017) SVDNet for pedestrian retrieval. 2017 IEEE International Conference on Computer Vision (ICCV)
Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6036–6046
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: Surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184
Fan X, Luo H, Zhang X, He L, Zhang C, Jiang W (2018) Scpnet: Spatial-channel parallelism network for joint holistic and partial person re-identification. In: Asian Conference on Computer Vision (ACCV), Springer, pp 19–34
Zhong Z, Zheng L, Zheng Z, Li S, Yang Y (2018) Camstyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Processing (ICIP) 28(3):1176–1190
Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang Y-G, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 650–667
Si J, Zhang H, Li C-G, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5363–5372
Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2285–2294
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gong, Y., Wang, L. & Xu, L. A feature aggregation network for multispectral pedestrian detection. Appl Intell 53, 22117–22131 (2023). https://doi.org/10.1007/s10489-023-04628-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04628-y