Transformer fusion and histogram layer multispectral pedestrian detection network

Zang, Ying; Fu, Chenglong; Yang, Dongsheng; Li, Hui; Ding, Chaotao; Liu, Qingshan

doi:10.1007/s11760-023-02579-y

Transformer fusion and histogram layer multispectral pedestrian detection network

Original Paper
Published: 05 May 2023

Volume 17, pages 3545–3553, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Ying Zang^1,2,3,
Chenglong Fu¹,
Dongsheng Yang^2,3,
Hui Li¹,
Chaotao Ding¹ &
…
Qingshan Liu¹

360 Accesses
1 Citation
Explore all metrics

Abstract

Due to the complementarity of multispectral data, the performance of pedestrian detection can be significantly improved, so multispectral pedestrian detection has received great attention from the research community. However, existing pedestrian detection algorithms still suffer from some problems, such as insufficient information exchange between the two streams, and lack of targeted network design for the characteristics of the image source. In practical application scenarios, different targeted network models are generally used during the day and night, and the day model and night model can be simply switched during the deduction process. Therefore, we propose two subnetworks FTHd (Fusion Transformer Histogram day) and FTn (Fusion Transformer night) for the characteristics of daytime and nighttime images. The texture features of RGB images during the day are more obvious. We first add a histogram layer to the input branch of the detection network. After that, we added the cross-modal feature fusion method CFT (Cross-Modal Fusion Transformer) module to fuse and interact features. By leveraging the Transformer’s self-attention, the network can naturally perform intra-modal and inter-modal fusion. The light at night is very weak, and thermal images play a key role. Since the texture information is weak, complex network structures are not required, and we combine the two streams into one stream to reduce the amount of computation. Finally, we add a CFT module to fuse and interact features. Compared with baseline methods, the proposed FTHd and FTn achieve improved pedestrian detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

A feature aggregation network for multispectral pedestrian detection

Article 22 June 2023

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Cross-modality complementary information fusion for multispectral pedestrian detection

Article 31 January 2023

Data availability

The KAIST dataset can be accessed via https://github.com/SoonminHwang/rgbt-ped-detection, and CVC-14 dataset can be accessed via http://adas.cvc.uab.es/elektra/enigma-portfolio/cvc-14-visible-fir-day-night-pedestrian-sequence-dataset/.

References

Hwang, S., Park, J., Kim, N., et al.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference On Computer Vision And Pattern Recognition, pp. 1037–1045 (2015)
Liu, J., Zhang, S,, Wang, S., et al.: Multispectral deep neural networks for pedestrian detection. arXiv:1611.02644 (2016)
Li, C., Song, D., Tong, R., et al.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv:1808.04818 (2018)
Zhang, L., Zhu, X., Chen, X., et al.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5127–5137 (2019)
Li, C., Song, D., Tong, R., et al.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit. 85, 161–171 (2019)
Article Google Scholar
Guan, D., Cao, Y., Yang, J., et al.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fus. 50, 148–157 (2019)
Article Google Scholar
Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: European Conference On Computer Vision. Springer, Cham, pp. 787-803 (2020)
Kim, J., Kim, H., Kim, T., et al.: MLPD: multi-label pedestrian detector in multispectral domain. IEEE Robot. Autom. Lett. 6(4), 7846–7853 (2021)
Article MathSciNet Google Scholar
Peeples, J., Xu, W., Zare, A.: Histogram layers for texture analysis. IEEE Trans. Artif. Intell. 3(4), 541–552 (2021)
Article Google Scholar
Li, H., Wu, X.J.: DenseFuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)
Article MathSciNet Google Scholar
Li, H., Wu, X.J., Kittler, J.: RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fus. 73, 72–86 (2021)
Article Google Scholar
Xu, H., Ma, J., Jiang, J., et al.: U2Fusion: a unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2020)
Article Google Scholar
Zhang, H., Fromont, E., Lefèvre, S., et al.: Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 276-280 (2020)
VS, V., Valanarasu, J.M.J., Oza, P., et al.: Image fusion transformer. arXiv:2107.09011 (2021)
Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)
Fang, QY., Han, DP., Wang, ZK.: Cross-modality fusion transformer for multispectral object detection. arXiv:2111.00273 (2021)
Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3828–3836 (2015)
Zhang, H., Xue, J., Dana, K.: Deep ten: Texture encoding network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 708–717 (2017)
Xue, J., Zhang, H., Dana, K.: Deep texture manifold for ground terrain recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 558–567 (2018)
Hu, Y., Long, Z., AlRegib, G.: Multi-level texture encoding and representation (multer) based on deep neural networks. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4410–4414 (2019)
Zhai, W., Cao, Y., Zhang, J., et al.: Deep multiple-attribute-perceived network for real-world texture recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3613–3622 (2019)
Chen, Z., Li, F., Quan, Y., et al.: Deep texture recognition via exploiting cross-layer statistical self-similarity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5231–5240 (2021)
Basu, S., Karki, M., Mukhopadhyay, S., et al.: A theoretical analysis of Deep Neural Networks for texture classification. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 992–999 (2016)
Basu, S., Mukhopadhyay, S., Karki, M., et al.: Deep neural networks for texture classification: a theoretical analysis. Neural Netw. 97, 173–182 (2018)
González, A., Fang, Z., Socarras, Y., et al.: Pedestrian detection at day/night time with visible and FIR cameras: a comparison. Sensors 16(6), 820 (2016)
Dollar, P., Wojek, C., Schiele, B., et al.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Konig, D., Adam, M., Jarvers, C., et al.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 49–56 (2017)
Park, K., Kim, S., Sohn, K.: Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognit. 80, 143–155 (2018)
Article Google Scholar
Choi, H., Kim, S., Park, K., et al.: Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 621–626 (2016)
Adadi, A., Berrada, M., et al.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Aceto, G., Ciuonzo, D., Montieri, A., et al.: MIMETIC: mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 165, 106944 (2019)
Article Google Scholar
Nascita, A., Montieri, A., Aceto, G., et al.: XAI meets mobile traffic classification: understanding and improving multimodal deep learning architectures. IEEE Trans. Netw. Serv. Manage. 18(4), 4225–4246 (2021)
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

School of Information Engineering, Huzhou University, Huzhou, 313000, China
Ying Zang, Chenglong Fu, Hui Li, Chaotao Ding & Qingshan Liu
University of Chinese Academy of Sciences, Beijing, 100049, China
Ying Zang & Dongsheng Yang
Chinese Academy of Sciences, Shenyang institute of computing technology, Shenyang, 110168, China
Ying Zang & Dongsheng Yang

Authors

Ying Zang
View author publications
You can also search for this author in PubMed Google Scholar
Chenglong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Chaotao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Qingshan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DY and YZ conceived the idea. YZ and CF realized the idea and wrote the main manuscript text. CF, HL, and CD prepared all figures and tables. DY and QL provided supervision. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dongsheng Yang.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zang, Y., Fu, C., Yang, D. et al. Transformer fusion and histogram layer multispectral pedestrian detection network. SIViP 17, 3545–3553 (2023). https://doi.org/10.1007/s11760-023-02579-y

Download citation

Received: 06 April 2022
Revised: 19 December 2022
Accepted: 29 March 2023
Published: 05 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11760-023-02579-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer fusion and histogram layer multispectral pedestrian detection network

Abstract

Access this article

Similar content being viewed by others

A feature aggregation network for multispectral pedestrian detection

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Cross-modality complementary information fusion for multispectral pedestrian detection

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transformer fusion and histogram layer multispectral pedestrian detection network

Abstract

Access this article

Similar content being viewed by others

A feature aggregation network for multispectral pedestrian detection

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

Cross-modality complementary information fusion for multispectral pedestrian detection

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation