Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Chu, Fuchen; Cao, Jiale; Shao, Zhuang; Pang, Yanwei

doi:10.1007/978-3-031-20497-5_28

Fuchen Chu¹²,
Jiale Cao¹²,
Zhuang Shao¹³ &
…
Yanwei Pang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1509 Accesses
8 Citations

Abstract

Multi-modal information (e.g., visible and thermal) can generate reliable and robust pedestrian detection results in various computer vision applications. Despite its broad applications, it remains a crucial problem that how to fuse the two modalities effectively. The self-attention operator of transformer can obtain long-range dependencies and integrate information across the entire input, which has been widely used for cross-modal fusion. However, there is still a lack of further analysis and design for transformer to use in multispectral pedestrian detection task. To benefit from both RGB and thermal modalities, we propose a novel illumination-guided transformer-based network (ITNet) for multispectral pedestrian detection in this paper. Firstly, different from the previous methods that apply the original transformer structure directly, we designed two different transformer-based fusion modules to make the RGB and thermal modalities complement each other. Secondly, an illumination-guided module is used to adaptively re-weight and fuse the multi-modal features according to the illumination conditions. Extensive evaluations on two benchmarks demonstrate the effectiveness of our proposed approach for multispectral pedestrian detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross-modality complementary information fusion for multispectral pedestrian detection

Article 31 January 2023

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Transformer fusion and histogram layer multispectral pedestrian detection network

Article 05 May 2023

References

Cao, J., Pang, Y., Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2016)
Google Scholar
Cao, J., Pang, Y., Li, X.: Learning multilayer channel features for pedestrian detection. IEEE Trans. Image Process. 26(7), 3210–3220 (2017)
Article MathSciNet MATH Google Scholar
Cao, Y., Guan, D., Wu, Y., Yang, J., Cao, Y., Yang, M.Y.: Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J. Photogram. Remote Sens. 150, 70–79 (2019)
Article Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Choi, H., Kim, S., Park, K., Sohn, K.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626. IEEE (2016)
Google Scholar
Dong, J., Hu, Z., Zhou, Y.: Revisiting knowledge distillation for image captioning. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 613–625. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_52
Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Gonzalez, A., et al.: Pedestrian detection at day/night time with visible and FIR cameras: a comparison. Pattern Recogn. 16(6), 820 (2016)
Google Scholar
Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2016)
Google Scholar
Huang, B., Xue, J., Lu, K., Tan, Y., Zhao, Y.: MPNet: multi-scale parallel codec net for medical image segmentation. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 492–503. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_42
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)
Google Scholar
Kieu, M., Bagdanov, A.D., Bertini, M., del Bimbo, A.: Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 546–562. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_33
Chapter Google Scholar
Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (2012)
Google Scholar
Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. In: Proceedings of the British Machine Vision Conference (2018)
Google Scholar
Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recogn. 85, 161–171 (2019)
Article Google Scholar
Li, C., Chen, D., Chen, J., Dai, H.: A cross-layer fusion multi-target detection and recognition method based on improved FPN model in complex traffic environment. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 323–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_28
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. In: Proceedings of the British Machine Vision Conference (2016)
Google Scholar
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision (2021)
Google Scholar
Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn. 80, 143–155 (2018)
Article Google Scholar
Qingyun, F., Dapeng, H., Zhaokui, W.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: Proceedings of the International Conference on Machine Learning (2021)
Google Scholar
Zhang, H., Fromont, E., Lefèvre, S., Avignon, B.: Low-cost multispectral scene analysis with modality distillation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (2022)
Google Scholar
Zhang, H., Huang, R., Yuan, L.: Robust indoor visual-inertial SLAM with pedestrian detection. In: 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 802–807. IEEE (2021)
Google Scholar
Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)
Article Google Scholar
Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar
Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 787–803. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_46
Chapter Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Key R &D Program of China (Grant No. 2018AAA0102802), Tianjin Research Program of Science and Technology (Grant No. 19ZXZNGX00050) and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Fuchen Chu, Jiale Cao & Yanwei Pang
Warwick Manufacturing Group, University of Warwick, Coventry, CV47AL, UK
Zhuang Shao

Authors

Fuchen Chu
View author publications
You can also search for this author in PubMed Google Scholar
Jiale Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhuang Shao
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Pang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanwei Pang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chu, F., Cao, J., Shao, Z., Pang, Y. (2022). Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_28
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Abstract

Access this chapter

Similar content being viewed by others

Cross-modality complementary information fusion for multispectral pedestrian detection

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Transformer fusion and histogram layer multispectral pedestrian detection network

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Abstract

Access this chapter

Similar content being viewed by others

Cross-modality complementary information fusion for multispectral pedestrian detection

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Transformer fusion and histogram layer multispectral pedestrian detection network

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation