Abstract
Cross-modality visible-infrared person re-identification (VI-ReID) aims to recognize images with the same identity between visible modality and infrared modality, which is a very challenging task because it not only includes the troubles of variations between cross-cameras in traditional person ReID, but also suffers from the huge differences between two modalities. Some existing VI-ReID methods extract the modality-shared information from global features through single-stream or double-stream networks, while ignoring the complementarity between fine-grained and coarse-grained information. To solve this problem, our paper designs a multi-granularity feature utilization network (MFUN) to make up for the lack of shared features between different modalities by promoting the complementarity between coarse-grained features and fine-grained features. Firstly, in order to learn fine-grained shared features better, we design a local feature constraint module, which uses both hard-mining triplet loss and heterogeneous center loss to constrain local features in the common subspace, so as to better promote intra-class compactness and inter-class differences at the level of sample and class center. Then, our method uses a multi-modality feature aggregation module for global features to fuse the information of two modalities to narrow the modality gap. Through the combination of these two modules, visible and infrared image features can be better fused, thus alleviating the problem of modality discrepancy and supplementing the lack of modality-shared information. Extensive experimental results on RegDB and SYSU-MM01 datasets fully prove that our proposed MFUN outperforms the state-of-the-art solutions. Our code is available at https://github.com/ZhangYinyinzzz/MFUN.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Chen Y, Wan L, Li Z et al (2021) Neural feature search for RGB-infrared person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 587–597. https://doi.org/10.1109/CVPR46437.2021.00065
Choi S, Lee S, Kim Y et al (2020) Hi-cmd: hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 10254–10263. https://doi.org/10.1109/CVPR42600.2020.01027
Dai P, Ji R, Wang H et al (2018) Cross-modality person re-identification with generative adversarial training. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 677–683. https://doi.org/10.24963/ijcai.2018/94
Dat N, Hong H, Ki K et al (2017) Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3):605. https://doi.org/10.3390/s17030605
Ding S, Lin L, Wang G et al (2015) Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn 48(10):2993–3003. https://doi.org/10.1016/j.patcog.2015.04.005
Feng Z, Lai J, Xie X (2020) Learning modality-specific representations for visible-infrared person re-identification. IEEE Trans Image Process 29:579–590. https://doi.org/10.1109/TIP.2019.2928126
Hao Y, Wang N, Gao X, et al (2019) Dual-alignment feature embedding for cross-modality person re-identification. In: The ACM International Conference on Multimedia (ACMMM), pp 57–65, https://doi.org/10.1145/3343031.3351006
Hu B, Liu J, Zha Z (2021) Adversarial disentanglement and correlation network for RGB-infrared person re-identification. In: IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428376
Kulis B, Saenko K, Darrell T (2011) What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1785–1792. https://doi.org/10.1109/CVPR.2011.5995702
Kumar D, Siva P, Marchwica P et al (2020) Unsupervised domain adaptation in person re-id via k-reciprocal clustering and large-scale heterogeneous environment synthesis. In: The IEEE winter conference on applications of computer vision (WACV), pp 2634–2643. https://doi.org/10.1109/WACV45572.2020.9093606
Leng Q, Ye M, Tian Q (2020) A survey of open-world person re-identification. IEEE Trans Circuits Syst Video Technol 30(4):1092–1108. https://doi.org/10.1109/TCSVT.2019.2898940
Liao S, Li SZ (2015) Efficient PSD constrained asymmetric metric learning for person re-identification. In: The IEEE international conference on computer vision (ICCV), pp 3685–3693. https://doi.org/10.1109/ICCV.2015.420
Ling Y, Zhong Z, Luo Z et al (2020) Class-aware modality mix and center-guided metric learning for visible-thermal person re-identification. In: The ACM international conference on multimedia (ACMMM), pp 889–897. https://doi.org/10.1145/3394171.3413821
Liu H, Tan X, Zhou X (2020) Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.3042080
Liu H, Chai Y, Tan X et al (2021) Strong but simple baseline with dual-granularity triplet loss for visible-thermal person re-identification. IEEE Signal Process Lett 28:653–657. https://doi.org/10.1109/LSP.2021.3065903
Li D, Wei X, Hong X et al (2020) Infrared-visible cross-modal person re-identification with an x modality. In: The AAAI conference on artificial intelligence (AAAI), pp 4610–4617. https://doi.org/10.1609/aaai.v34i04.5891
Park H, Lee S, Lee J et al (2021) Learning by aligning: visible-infrared person re-identification using cross-modal correspondences. In: The IEEE international conference on computer vision (ICCV), pp 12026–12035. https://doi.org/10.1109/ICCV48922.2021.01183
Sun J, Zhang T (2021) RGB-infrared person re-identification via multi-modality relation aggregation and graph convolution network. In: IEEE international conference on image processing (ICIP), pp 1174–1178. https://doi.org/10.1109/ICIP42928.2021.9506288
Sun Y, Zheng L, Yang Y et al (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: The European conference on computer vision (ECCV), pp 501–518. https://doi.org/10.1007/978-3-030-01225-0_30
Van der Maaten L, Hinton G (2008) Visualizing data using T-SNE. J Mach Learn Res 9:11
Varior RR, Shuai B, Lu J et al (2016) A siamese long short-term memory architecture for human re-identification. In: The European conference on computer vision (ECCV), pp 135–153. https://doi.org/10.1007/978-3-319-46478-7_9
Wang GA, Yang T, Cheng J et al (2020) Cross-modality paired-images generation for RGB-infrared person re-identification. In: The AAAI conference on artificial intelligence (AAAI), pp 12144–12151. https://doi.org/10.1609/aaai.v34i07.6894
Wang Z, Wang Z, Zheng Y et al (2019b) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 618–626. https://doi.org/10.1109/CVPR.2019.00071
Wang G, Zhang T, Cheng J et al (2019a) RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In: The IEEE international conference on computer vision (ICCV), pp 3622–3631. https://doi.org/10.1109/ICCV.2019.00372
Wei Z, Yang X, Wang N et al (2021) Flexible body partition-based adversarial learning for visible infrared person re-identification. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3059713
Wei X, Li D, Hong X et al (2020a) Co-attentive lifting for infrared-visible person re-identification. The ACM international conference on multimedia (ACMMM), pp 1028–1031. https://doi.org/10.1145/3394171.3413933
Wei Z, Yang X, Wang N et al (2020b) ABP: Adaptive body partition model for visible infrared person re-identification. In: IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME46284.2020.9102974
Wei L, Zhang S, Gao W et al (2018) Person transfer GAN to bridge domain gap for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 79–88. https://doi.org/10.1109/CVPR.2018.00016
Wu A, Zheng W, Yu H et al (2017) RGB-infrared cross-modality person re-identification. In: The IEEE international conference on computer vision (ICCV), pp 5390–5399. https://doi.org/10.1109/ICCV.2017.575
Xiang X, Lv N, Yu Z et al (2019) Cross-modality person re-identification based on dual-path multi-branch network. IEEE Sens J 19(23):11706–11713. https://doi.org/10.1109/JSEN.2019.2936916
Xiao T, Li H, Ouyang W, et al (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1249–1258. https://doi.org/10.1109/CVPR.2016.140
Ye M, Lan X, Leng Q et al (2020) Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans Image Process 29:9387–9399. https://doi.org/10.1109/TIP.2020.2998275
Ye M, Lan X, Wang Z et al (2020) Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forens Secur. https://doi.org/10.1109/TIFS.2019.2921454
Ye M, Shen J, Lin G et al (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3054775
Ye M, Shen J, Shao L (2021) Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Trans Inf Forens Secur 16:728–739. https://doi.org/10.1109/TIFS.2020.3001665
Ye M, Lanb X, Li J et al (2018a) Hierarchical discriminative learning for visible thermal person re-identification. In: The AAAI conference on artificial intelligence (AAAI), pp 7501–7508. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16734
Ye M, Lan X, Leng Q (2019) Modality-aware collaborative learning for visible thermal person re-identification. In: The ACM international conference on multimedia (ACMMM), pp 347–355. https://doi.org/10.1145/3343031.3351043
Ye M, Shen J, Crandall D et al (2020c) Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: The European conference on computer vision (ECCV), pp 229–247. https://doi.org/10.1007/978-3-030-58520-4_14
Ye M, Wang Z, Lan X et al (2018b) Visible thermal person re-identification via dual-constrained top-ranking. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 1092–1099. https://doi.org/10.24963/ijcai.2018/152
Zhang G, Ge Y, Dong Z et al (2021) Deep high-resolution representation learning for cross-resolution person re-identification. IEEE Tran Image Process 30:8913–8925. https://doi.org/10.1109/TIP.2021.3120054
Zhang G, Yang J, Zheng Y et al (2021) Hybrid-attention guided network with multiple resolution features for person re-identification. Inf Sci 578:525–538. https://doi.org/10.1016/j.ins.2021.07.058
Zhang Q, Lai J, Xie X (2021) Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification. IEEE Trans Image Process 30:8019–8033. https://doi.org/10.1109/TIP.2021.3112035
Zhang G, Luo Z, Chen Y et al (2022) Illumination unification for person re-identification. IEEE Trans Circuits Syst Video Technol 32(10):6766–6777
Zhang H, Zhang G, Chen Y et al (2022) Global relation-aware contrast learning for unsupervised person re-identification. IEEE Trans Circuits Syst Video Technol 32(12):8599–8610
Zhang G, Chen Y, Dai Y et al (2021a) Reference-aided part-aligned feature disentangling for video person re-identification. In: The IEEE international conference on multimedia and expo (ICME), pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428118
Zhang G, Chen Y, Lin W, et al (2021b) Low resolution information also matters: Learning multi-resolution representation for person re-identification. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 1295–1301. https://doi.org/10.24963/ijcai.2021/179
Zhang C, Liu H, Guo W et al (2020) Multi-scale cascading network with compact feature learning for RGB-infrared person re-identification. In: International conference on pattern recognition (ICPR), pp 8679–8686. https://doi.org/10.1109/ICPR48806.2021.9412576
Zhang X, Luo H, Fan X et al (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184
Zhao Y, Lin J, Xuan Q et al (2019) Hpiln: a feature learning framework for cross-modality person re-identification. IET Image Process 13(14):2897–2904. https://doi.org/10.1049/iet-ipr.2019.0699
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984
Zhu Y, Yang Z, Wang L et al (2020) Hetero-center loss for cross-modality person re-identification. Neurocomputing 386:97–109. https://doi.org/10.1016/j.neucom.2019.12.100
Acknowledgements
This research is supported in part by the National Natural Science Foundation of China under Grant 62172231 and U20B2065 and by the Natural Science Foundation of Jiangsu Province of China under Grant BK20220107 and BK20211539; this research is also supported in part by the Engineering Research Center of Digital Forensics, Ministry of Education.
Funding
Funding is provided by National Natural Science Foundation of China (Grant Nos. 62172231, U20B2065) and Natural Science Foundation of Jiangsu Province (Grant Nos. BK20220107, BK20211539).
Author information
Authors and Affiliations
Contributions
All authors contributed to the conceptualization and methodology. The writing—original draft, writing—review editing and visualization were performed by Guoqing Zhang and Yinyin Zhang. Investigation and data curation were performed by Yuhao Chen and Hongwei Zhang. Supervision was performed by Yuhui Zheng, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare no conflict of interest.
Human and animal rights
This study does not contain any studies with human participants or animals performed by any of the authors.
Ethical approval
Any of the authors’ investigations with human participants or animals are not included in this article.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, G., Zhang, Y., Chen, Y. et al. Multi-granularity feature utilization network for cross-modality visible-infrared person re-identification. Soft Comput (2023). https://doi.org/10.1007/s00500-023-08321-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s00500-023-08321-7