Skip to main content

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

Abstract

Multi-modal information (e.g., visible and thermal) can generate reliable and robust pedestrian detection results in various computer vision applications. Despite its broad applications, it remains a crucial problem that how to fuse the two modalities effectively. The self-attention operator of transformer can obtain long-range dependencies and integrate information across the entire input, which has been widely used for cross-modal fusion. However, there is still a lack of further analysis and design for transformer to use in multispectral pedestrian detection task. To benefit from both RGB and thermal modalities, we propose a novel illumination-guided transformer-based network (ITNet) for multispectral pedestrian detection in this paper. Firstly, different from the previous methods that apply the original transformer structure directly, we designed two different transformer-based fusion modules to make the RGB and thermal modalities complement each other. Secondly, an illumination-guided module is used to adaptively re-weight and fuse the multi-modal features according to the illumination conditions. Extensive evaluations on two benchmarks demonstrate the effectiveness of our proposed approach for multispectral pedestrian detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cao, J., Pang, Y., Li, X.: Pedestrian detection inspired by appearance constancy and shape symmetry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2016)

    Google Scholar 

  2. Cao, J., Pang, Y., Li, X.: Learning multilayer channel features for pedestrian detection. IEEE Trans. Image Process. 26(7), 3210–3220 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  3. Cao, Y., Guan, D., Wu, Y., Yang, J., Cao, Y., Yang, M.Y.: Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J. Photogram. Remote Sens. 150, 70–79 (2019)

    Article  Google Scholar 

  4. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  5. Choi, H., Kim, S., Park, K., Sohn, K.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626. IEEE (2016)

    Google Scholar 

  6. Dong, J., Hu, Z., Zhou, Y.: Revisiting knowledge distillation for image captioning. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 613–625. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_52

  7. Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  8. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  9. Gonzalez, A., et al.: Pedestrian detection at day/night time with visible and FIR cameras: a comparison. Pattern Recogn. 16(6), 820 (2016)

    Google Scholar 

  10. Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)

    Article  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2016)

    Google Scholar 

  12. Huang, B., Xue, J., Lu, K., Tan, Y., Zhao, Y.: MPNet: multi-scale parallel codec net for medical image segmentation. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 492–503. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_42

  13. Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)

    Google Scholar 

  14. Kieu, M., Bagdanov, A.D., Bertini, M., del Bimbo, A.: Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 546–562. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_33

    Chapter  Google Scholar 

  15. Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  17. Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. In: Proceedings of the British Machine Vision Conference (2018)

    Google Scholar 

  18. Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recogn. 85, 161–171 (2019)

    Article  Google Scholar 

  19. Li, C., Chen, D., Chen, J., Dai, H.: A cross-layer fusion multi-target detection and recognition method based on improved FPN model in complex traffic environment. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) CICAI 2021. LNCS, vol. 13069, pp. 323–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_28

  20. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  21. Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. In: Proceedings of the British Machine Vision Conference (2016)

    Google Scholar 

  22. Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  23. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision (2021)

    Google Scholar 

  24. Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn. 80, 143–155 (2018)

    Article  Google Scholar 

  25. Qingyun, F., Dapeng, H., Zhaokui, W.: Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273 (2021)

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  27. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: Proceedings of the International Conference on Machine Learning (2021)

    Google Scholar 

  28. Zhang, H., Fromont, E., Lefèvre, S., Avignon, B.: Low-cost multispectral scene analysis with modality distillation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (2022)

    Google Scholar 

  29. Zhang, H., Huang, R., Yuan, L.: Robust indoor visual-inertial SLAM with pedestrian detection. In: 2021 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 802–807. IEEE (2021)

    Google Scholar 

  30. Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)

    Article  Google Scholar 

  31. Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  32. Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 787–803. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_46

    Chapter  Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Key R &D Program of China (Grant No. 2018AAA0102802), Tianjin Research Program of Science and Technology (Grant No. 19ZXZNGX00050) and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanwei Pang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chu, F., Cao, J., Shao, Z., Pang, Y. (2022). Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics