Abstract
This paper proposes a deep RGB and thermal image fusion method for pedestrian detection. A two-branch structure is designed to learn the features of RGB and thermal images respectively, and these features are fused with a cross-modality feature selection module for detection. It includes the following stages. First, we learn features from paired RGB and thermal images through a backbone network with a residual structure, and add a feature squeeze-excitation module to the residual structure; Then we fuse the learned features from two branches, and a cross-modality feature selection module is designed to strengthen the effective information and compress the useless information during the fusion process; Finally, multi-scale features are fused for pedestrian detection. Two sets of experiments on the public KAIST pedestrian dataset are conducted, and experimental results show that our method is better than the state-of-the-art methods. The robustness of fused features is improved, and the miss rate is reduced obviously.
Supported by National Key R & D Program of China (2019YFB1309900), National Natural Science Foundation of China (61702348, 61772351), Beijing Nova Program of Science and Technology (Z191100001119075), the National Technology Innovation Special Zone (19-163-11-ZT-001-005-06) and Academy for Multidisciplinary Studies, Capital Normal University (19530012005).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 82–90 (2015)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Li, J., Liang, X., Shen, S., Xu, T., Feng, J., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2018)
Buddharaju, P., Pavlidis, I.T., Tsiamyrtzis, P., Bazakos, M.: Physiology-based face recognition in the thermal infrared spectrum. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 29(4), 613–626 (2007)
Kong, S.G., et al.: Multiscale fusion of visible and thermal IR images for illumination-invariant face recognition. Int. J. Comput. Vis. 71(2), 215–233 (2007)
Leykin, A., Ran, Y., Hammoud, R.: Thermal visible video fusion for moving target tracking and pedestrian classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Torabi, A., Massé, G., Bilodeau, G.-A.: An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications. Comput. Vis. Image Underst. 116(2), 210–221 (2012)
Zhu, Y., Guo, G.: A study on visible to infrared action recognition. IEEE Signal Process. Lett. 20(9), 897–900 (2013)
Gao, C., et al.: Infar dataset: infrared action recognition at different times. Neurocomputing 212, 36–47 (2016)
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1037–1045 (2015)
Xu, D., Ouyang, W., Ricci, E., Wang, X., Sebe, N.: Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5363–5371 (2017)
González, A., et al.: Pedestrian detection at day/night time with visible and fir cameras: a comparison. Sensors 16(6), 820 (2016)
Park, K., Kim, S., Sohn, K.: Unfied multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn. 80, 143–155 (2018)
Neubauer, A., Yochelis, S., Paltiel, Y.: Simple multi spectral detection using infrared nanocrystal detector. IEEE Sens. J. 19(10), 3668–3672 (2019)
Zhang, L., et al.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5127–5137 (2019)
Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)
Guan, D., Cao, Y., Yang, J., Cao, Y., Tisse, C.L.: Exploiting fusion architectures for multispectral pedestrian detection and segmentation. Appl. Opt. 57(18), D108–D116 (2018)
Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit. 85, 161–171 (2019)
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. In: British Machine Vision Conference (BMVC), arXiv:1611.02644 (2016)
König, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 243–250 (2017)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Li, C., Song, D., Tong, R., et al.: Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation. arXiv, Computer Vision and Pattern Recognition, arXiv:1808.04818 (2018)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: CVPR, pp. 1037–1045 (2015)
Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. TPAMI 36(8), 1532–1545 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Li, M., Shao, Z., Shi, Z., Guan, Y. (2021). Deep Visible and Thermal Image Fusion with Cross-Modality Feature Selection for Pedestrian Detection. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-79478-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)