Abstract
Multi-modal information (e.g., visible and thermal) can generate reliable and robust pedestrian detection results in various computer vision applications. Despite its broad applications, it remains a crucial problem that how to fuse the two modalities effectively. The self-attention operator of transformer can obtain long-range dependencies and integrate information across the entire input, which has been widely used for cross-modal fusion. However, there is still a lack of further analysis and design for transformer to use in multispectral pedestrian detection task. To benefit from both RGB and thermal modalities, we propose a novel illumination-guided transformer-based network (ITNet) for multispectral pedestrian detection in this paper. Firstly, different from the previous methods that apply the original transformer structure directly, we designed two different transformer-based fusion modules to make the RGB and thermal modalities complement each other. Secondly, an illumination-guided module is used to adaptively re-weight and fuse the multi-modal features according to the illumination conditions. Extensive evaluations on two benchmarks demonstrate the effectiveness of our proposed approach for multispectral pedestrian detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cao, J., Pang, Y., Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2016)
Cao, J., Pang, Y., Li, X.: Learning multilayer channel features for pedestrian detection. IEEE Trans. Image Process. 26(7), 3210–3220 (2017)
Cao, Y., Guan, D., Wu, Y., Yang, J., Cao, Y., Yang, M.Y.: Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J. Photogram. Remote Sens. 150, 70–79 (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Choi, H., Kim, S., Park, K., Sohn, K.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626. IEEE (2016)
Dong, J., Hu, Z., Zhou, Y.: Revisiting knowledge distillation for image captioning. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 613–625. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_52
Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Gonzalez, A., et al.: Pedestrian detection at day/night time with visible and FIR cameras: a comparison. Pattern Recogn. 16(6), 820 (2016)
Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2016)
Huang, B., Xue, J., Lu, K., Tan, Y., Zhao, Y.: MPNet: multi-scale parallel codec net for medical image segmentation. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 492–503. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_42
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Kieu, M., Bagdanov, A.D., Bertini, M., del Bimbo, A.: Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 546–562. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_33
Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (2012)
Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. In: Proceedings of the British Machine Vision Conference (2018)
Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recogn. 85, 161–171 (2019)
Li, C., Chen, D., Chen, J., Dai, H.: A cross-layer fusion multi-target detection and recognition method based on improved FPN model in complex traffic environment. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 323–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_28
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. In: Proceedings of the British Machine Vision Conference (2016)
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision (2021)
Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn. 80, 143–155 (2018)
Qingyun, F., Dapeng, H., Zhaokui, W.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: Proceedings of the International Conference on Machine Learning (2021)
Zhang, H., Fromont, E., Lefèvre, S., Avignon, B.: Low-cost multispectral scene analysis with modality distillation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (2022)
Zhang, H., Huang, R., Yuan, L.: Robust indoor visual-inertial SLAM with pedestrian detection. In: 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 802–807. IEEE (2021)
Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)
Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 787–803. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_46
Acknowledgment
This work was supported in part by the National Key R &D Program of China (Grant No. 2018AAA0102802), Tianjin Research Program of Science and Technology (Grant No. 19ZXZNGX00050) and CAAI-Huawei MindSpore Open Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chu, F., Cao, J., Shao, Z., Pang, Y. (2022). Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)