LGADet: Light-weight Anchor-free Multispectral Pedestrian Detection with Mixed Local and Global Attention

Zuo, Xin; Wang, Zhi; Liu, Yue; Shen, Jifeng; Wang, Haoran

doi:10.1007/s11063-022-10991-7

LGADet: Light-weight Anchor-free Multispectral Pedestrian Detection with Mixed Local and Global Attention

Published: 13 August 2022

Volume 55, pages 2935–2952, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Xin Zuo¹,
Zhi Wang¹,
Yue Liu²,
Jifeng Shen ORCID: orcid.org/0000-0002-4356-1831² &
…
Haoran Wang³

649 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Balancing accuracy and efficiency is of significant importance for multispectral pedestrian detection in practical applications. To address these problems, a light-weight anchor-free multispectral pedestrian detection method with mixed Local and Global Attention mechanism (LGA) is proposed to narrow the gap between academic research and practical application. The anchor-free detection pipeline equipped with light-weight backbone leads to significant speedup, while a mixed attention mechanism is utilized to refine features in order to improve the accuracy. Specifically, an anchor-free pedestrian detection framework with MobileNetV2 backbone is firstly utilized to reduce the computational complexity, achieving significant speedup for model inference. Secondly, our method makes use of DMAF module to enhance complementary information between RGB and Thermal image features. Finally, the quality of feature fusion is greatly improved with local and global attention mechanisms, thus enhancing the detection accuracy. Experiments on the KAIST, FLIR and CVC-14 datasets show significant performance improvement in terms of MR, comparing with other state-of-the-art methods. When deployed on the Nvidia Jetson TX2, impressing result is obtained with good compromise between accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MCANet: Multiscale Cross-Modality Attention Network for Multispectral Pedestrian Detection

MAPD: multi-receptive field and attention mechanism for multispectral pedestrian detection

Article 10 July 2023

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

References

Yu X, Fu D (2014) Target extraction from blurred trace infrared images with a superstring galaxy template algorithm. Infrared Phys Technol 64:9–12
Article Google Scholar
Vandersteegen M, Beeck K V, Goedemé T (2018) Real-time multispectral pedestrian detection with a single-pass deep neural network. In: International conference image analysis and recognition, 419–426
Wagner J, Fischer V, Herman M, et al (2016) Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks. In: Proceedings of 24th European symposium on artificial neural networks, computational intelligence and machine learning, 509–514
Xu D, Ouyang W, Ricci E, et al (2017) Learning cross-modal deep representations for robust pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5363–5371
Liu J, Zhang S, Wang S, et al (2016) Multispectral deep neural networks for pedestrian detection. In: Proceedings of 27th British machine vision conference, 731–733
D. Konig, M. Adam, C. Jarvers, et al (2017) Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops, 49–56
Wolpert A, Teutsch M, Sarfraz M S, et al (2020) Anchor-free small-scale multispectral pedestrian detection. arXiv preprint. arXiv:2008.08418
Zhang H, Fromont E, Lefèvre S, et al (2021) Guided attentive feature fusion for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 72–80.
Kim J, Kim H, Kim T et al (2021) MLPD: multi-label pedestrian detector in multispectral domain. IEEE Robot Autom Lett 6(4):7846–7853
Article MathSciNet Google Scholar
Howard AG, Zhu M, Chen B, et al (2017) MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint. arXiv:1704.04861
Zhang X, Zhou X, Lin M, et al (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 6848–6856
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint. arXiv:1904.07850
Tian Z, Shen C, Chen H et al (2020) FCOS: A simple and strong anchor-free object detector. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2020.3032166
Article Google Scholar
Law H, Deng J (2018) CornerNet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), 734–750
Tian Z, Shen C, Chen H, et al (2019) FCOS: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, 9627–9636
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3431–3440
Kim D, Park S, Kang D, et al (2019) Improved center and scale prediction-based pedestrian detection using convolutional block. In: IEEE 9th international conference on consumer electronics, 418–419
Hwang S, Park J, Kim N, et al (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1037–1045
Ren S, He K, Girshick R et al (2016) Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Zhou K, Chen L, Cao X (2020) Improving multispectral pedestrian detection by addressing modality imbalance problems. In: 16th European conference of computer vision, 787–803
Mnih V, Heess N, Graves A, et al (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems, 2: 2204–2212
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint. arXiv:1409.0473
Li X, Wang W, Hu X, et al (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 510–519
Woo S, Park J, Lee J Y, et al (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), 3–19
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 6995–7003
Pang Y, Xie J, Khan M H, et al (2019) Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE/CVF international conference on computer vision, 4967–4975
Feng TT, Ge HY (2020) Pedestrian detection based on attention mechanism and feature enhancement with SSD. In: 2020 5th international conference on communication, image and signal processing, 145–148
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778
Wang X, Girshick R, Gupta A, et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7794–7803
Lin T Y, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, 2980–2988
Sandler M, Howard A, Zhu M, et al (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 4510–4520
Li C, Song D, Tong R et al (2019) Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recogn 85:161–171
Article Google Scholar
Guan D, Cao Y, Yang J et al (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inform Fus 50:148–157
Article Google Scholar
Zhang L, Zhu X, Chen X, et al (2019) Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF international conference on computer vision. 5127–5137
Park K, Kim S, Sohn K (2018) Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recogn 80:143–155
Article Google Scholar
Devaguptapu C, Akolekar N, M Sharma M, et al (2019) Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0–0
Kieu M, Bagdanov AD, Bertini M (2021) Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images. ACM Trans Multimed Comput Commun Appl (TOMM) 17(1):1–19
Article Google Scholar
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 850–859
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 840–849
Kong T, Sun F, Liu H et al (2020) FoveaBox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF of China under Grant No. 61903164 and NSF of Jiangsu Province in China under Grants BK20191427, and also in part by the Foundation of Key Laboratory of Aerospace System Simulation (6142002200301) and the Fundamental Research Funds for the Central Universities of China (N2004022)

Author information

Authors and Affiliations

School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
Xin Zuo & Zhi Wang
School of Electronic and Informatics Engineering, Jiangsu University, Zhenjiang, 212013, China
Yue Liu & Jifeng Shen
College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Haoran Wang

Authors

Xin Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jifeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jifeng Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zuo, X., Wang, Z., Liu, Y. et al. LGADet: Light-weight Anchor-free Multispectral Pedestrian Detection with Mixed Local and Global Attention. Neural Process Lett 55, 2935–2952 (2023). https://doi.org/10.1007/s11063-022-10991-7

Download citation

Accepted: 28 July 2022
Published: 13 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11063-022-10991-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LGADet: Light-weight Anchor-free Multispectral Pedestrian Detection with Mixed Local and Global Attention

Abstract

Access this article

Similar content being viewed by others

MCANet: Multiscale Cross-Modality Attention Network for Multispectral Pedestrian Detection

MAPD: multi-receptive field and attention mechanism for multispectral pedestrian detection

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LGADet: Light-weight Anchor-free Multispectral Pedestrian Detection with Mixed Local and Global Attention

Abstract

Access this article

Similar content being viewed by others

MCANet: Multiscale Cross-Modality Attention Network for Multispectral Pedestrian Detection

MAPD: multi-receptive field and attention mechanism for multispectral pedestrian detection

Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation