Abstract
Thermal pedestrian detection is a core problem in computer vision. Usually, the corresponding visual image knowledge is used to improve the performance in thermal domain. However, existing methods always assume the same resolution between visible and thermal images. But in reality, there is a problem with this setting. Since thermal imaging acquisition equipment is expensive, the resolution of thermal images is always lower than visible images. To address this issue, we propose a new method, named as Disentanglement Then Restoration (DTR). The key idea is to disentangle the features into content features and modal features and restore the complete content features of thermal images by learning the changes of content features caused by different resolutions. Specifically, we first train an object detector such as YOLO to initialize our model. Then, a feature disentanglement network is trained, which can disentangle the features from the backbone as content features and modal features. In the end, the feature disentanglement network is frozen. By forcing the content feature consistency between visual image and upsampled thermal image, the complete content features of low-resolution thermal images are restored. Experiment results on public datasets show that our method performs very well. Code is available at https://github.com/HaMeow-lst1/DTR.
Similar content being viewed by others
Data Availability
The KAIST dataset analyzed during the current study is available at https://soonminhwang.github.io/rgbt-ped-detection/. The LLVIP dataset analyzed during the current study is available at https://bupt-ai-cz.github.io/LLVIP/.
References
Cao, J., Pang, Y., Xie, J., Khan, F.S., Shao, L.: From handcrafted to deep features for pedestrian detection: a survey. IEEE Trans. Patt. Anal. Mach. Intell. 44, 4913–4934 (2022)
Tang, Y., Li, B., Liu, M., Chen, B., Wang, Y., Ouyang, W.: Autopedestrian: an automatic data augmentation and loss function search scheme for pedestrian detection. IEEE Trans. Image Process. 30, 8483–8496 (2021)
Zhou, C., Wu, M., Lam, S.-K.: Enhanced multi-task learning architecture for detecting pedestrian at far distance. IEEE Trans. Intell. Transport. Sys. 30, 15588–15604 (2022)
He, Y., Zhu, C., Yin, X.-C.: Occluded pedestrian detection via distribution-based mutual-supervised feature learning. IEEE Trans. Intell. Transport. Syst. 23, 10514–10529 (2021)
Jiao, Y., Yao, H., Xu, C.: San: selective alignment network for cross-domain pedestrian detection. IEEE Trans. Image Process. 30, 2155–2167 (2021)
Wu, J., Zhou, C., Yang, M., Zhang, Q., Li, Y., Yuan, J.: Temporal-context enhanced detection of heavily occluded pedestrians. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13430–13439 (2020)
Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 49–56 (2017)
Chen, Z., Huang, X.: Pedestrian detection for autonomous vehicle using multi-spectral cameras. IEEE Trans. Intell. Veh. 4(2), 211–219 (2019)
Kim, J.U., Park, S., Ro, Y.M.: Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection. IEEE Trans. Circ. Sys. Video Technol. 32(3), 1510–1523 (2022)
Dasgupta, K., Das, A., Das, S., Bhattacharya, U., Yogamani, S.: Spatio-contextual deep network-based multimodal pedestrian detection for autonomous driving. IEEE Trans. Intell. Transport. Sys. 23, 15940–15950 (2022)
Zhang, L., Liu, Z., Zhang, S., Yang, X., Qiao, H., Huang, K., Hussain, A.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)
Li, C., Song, D., Tong, R., Tang, M.: Illumination-aware faster r-cnn for robust multispectral pedestrian detection. Patt. Recogn. 85, 161–171 (2019)
Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)
Herrmann, C., Ruf, M., Beyerer, J.: Cnn-based thermal infrared person detection by domain adaptation. In: Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything. International Society for Optics and Photonics, vol. 10643, p. 1064308 (2018)
Ghose, D., Desai, S.M., Bhattacharya, S., Chakraborty, D., Fiterau, M., Rahman, T.: Pedestrian detection in thermal images using saliency maps. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (2019)
Xu, Z., Vong, C.-M., Wong, C.-C., Liu, Q.: Ground plane context aggregation network for day-and-night on vehicular pedestrian detection. IEEE Trans. Intell. Transp. Syst. 22(10), 6395–6406 (2020)
Kim, J.U., Park, S., Ro, Y.M.: Robust small-scale pedestrian detection with cued recall via memory learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3050–3059 (2021)
Kieu, M., Bagdanov, A.D., Bertini, M., Bimbo, A.d.: Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: European conference on computer vision, pp. 546–562 (2020). Springer
Kieu, M., Bagdanov, A.D., Bertini, M., Bimbo, A.D.: Domain adaptation for privacy-preserving pedestrian detection in thermal imagery. In: International Conference on Image Analysis and Processing, Springer, pp. 203–213 (2019)
Kieu, M., Bagdanov, A.D., Bertini, M.: Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images. ACM Trans. Multim. Comput. Commun. Appl. (TOMM) 17(1), 1–19 (2021)
Kieu, M., Berlincioni, L., Galteri, L., Bertini, M., Bagdanov, A.D., Del Bimbo, A.: Robust pedestrian detection in thermal imagery using synthesized images. In: 2020 25th International conference on pattern recognition (ICPR), IEEE, pp. 8804–8811 (2021)
Guo, T., Huynh, C.P., Solh, M.: Domain-adaptive pedestrian detection in thermal images. In: 2019 IEEE International conference on image processing (ICIP), IEEE, pp. 1660–1664 (2019)
Liu, D., Zhang, C., Song, Y., Huang, H., Wang, C., Barnett, M., Cai, W.: Decompose to adapt: cross-domain object detection via feature disentanglement. IEEE Trans. Multim. (2022). https://doi.org/10.1109/TMM.2022.3141614
Chen, Z., Yang, C., Li, Q., Zhao, F., Zha, Z.-J., Wu, F.: Disentangle your dense object detector. In: Proceedings of the 29th ACM international conference on multimedia, pp. 4939–4948 (2021)
Lin, C., Yuan, Z., Zhao, S., Sun, P., Wang, C., Cai, J.: Domain-invariant disentangled network for generalizable object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8771–8780 (2021)
Wu, A., Han, Y., Zhu, L., Yang, Y.: Instance-invariant domain adaptive object detection via progressive disentanglement. IEEE Trans. Patt. Anal. Mach. Intell. 44(8), 4178–4193 (2022)
Kim, J.U., Park, S., Ro, Y.M.: Towards versatile pedestrian detector with multisensory-matching and multispectral recalling memory. In: 36th AAAI conference on artificial intelligence, Association for the Advancement of Artificial Intelligence (AAAI 22) (2022)
Jhoo, W.Y., Heo, J.-P.: Collaborative learning with disentangled features for zero-shot domain adaptation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8896–8905 (2021)
Lin, C.-C., Chu, H.-L., Wang, Y.-C.F., Lei, C.-L.: Joint feature disentanglement and hallucination for few-shot image classification. IEEE Trans. Image Process. 30, 9245–9258 (2021)
Tang, L., Li, B., Zhong, Y., Ding, S., Song, M.: Disentangled high quality salient object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3580–3590 (2021)
Wu, A., Liu, R., Han, Y., Zhu, L., Yang, Y.: Vector-decomposed disentanglement for domain-invariant object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9342–9351 (2021)
Jia, M., Cheng, X., Lu, S., Zhang, J.: Learning disentangled representation implicitly via transformer for occluded person re-identification. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3141267
Lee, Y., Yoo, H., Yu, J., Jeon, M.: Learning to see in the rain via disentangled representation. IEEE Robot. Autom. Lett. (2021). https://doi.org/10.1109/LRA.2021.3117249
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Peng, X., Huang, Z., Sun, X., Saenko, K.: Domain agnostic learning with disentangled representations. In: International Conference on Machine Learning, PMLR, pp. 5102–5112 (2019)
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp. 234–241 (2015)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1037–1045 (2015)
Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818 (2018)
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
Jia, X., Zhu, C., Li, M., Tang, W., Zhou, W.: Llvip: a visible-infrared paired dataset for low-light vision. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3496–3504 (2021)
Baek, J., Hong, S., Kim, J., Kim, E.: Efficient pedestrian detection at nighttime using a thermal camera. Sensors 17(8), 1850 (2017)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4(4), 409–423 (1989)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (62276048), Sichuan Science and Technology Program (2020YFG0476).
Author information
Authors and Affiliations
Contributions
SL presented the method and design of the experiment. SL, JC, and LT finished the experiment. SL, MY and TL wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Cui, J., Ye, M. et al. Thermal pedestrian detection based on different resolution visual image. SIViP 17, 4347–4355 (2023). https://doi.org/10.1007/s11760-023-02667-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02667-z