Abstract
In recent years, the task of person re-identification (ReID) has placed a critical demand on accurately describing image features. Attention mechanisms, particularly Transformer-like self-attention (TLSA), have gained favor among researchers due to their outstanding feature descriptive performance. However, due to their intricate structures, TLSA models typically require more computational resources. Simultaneously, contrastive learning has significantly enhanced the performance of unsupervised person re-identification. Nevertheless, contrastive learning originates from deep exploration of relationships among multiple samples, making batch size a crucial factor influencing deep learning methods based on the contrastive learning paradigm. Therefore, under the constraint of limited computational resources, traditional TLSA models often struggle to effectively adapt to unsupervised person ReID methods based on the contrastive learning paradigm. In response to the aforementioned issues, we propose a novel and lightweight Multi-Level Attention (MLA) method in this paper, which effectively mitigates the computational resource conflicts of the TLSA model during training under the contrastive learning paradigm. MLA comprises a lightweight multi-head attention module, complemented by a spatial feature weighting module, and an inter-feature cross-attention module to assist it. By fully leveraging the complementary strengths of these attention mechanisms, our approach achieves significant performance improvements in the ReID task. We evaluated the proposed approach on three large-scale real person ReID datasets, namely Market-1501, DukeMTMC-reID, MSMT17, and the virtual person ReID dataset, PersonX. The experimental results demonstrate that our method outperforms state-of-the-art approaches without relying on supplemental pre-training procedures or additional training data.
Similar content being viewed by others
Availability of data and materials
References
Yan C, Pang G, Bai X, Liu C, Ning X, Gu L, Zhou J (2021) Beyond triplet loss: person re-identification with fine-grained difference-aware pairwise loss. IEEE Trans Multimedia 24:1665–1677
Liu W, Chang X, Chen L, Phung D, Zhang X, Yang Y, Hauptmann AG (2020) Pair-based uncertainty and diversity promoting early active learning for person re-identification. ACM Trans Intell Syst Technol 11(2):1–15
Liu H, Tan X, Zhou X (2020) Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans Multimedia 23:4414–4425
Liu H, Chai Y, Tan X, Li D, Zhou X (2021) Strong but simple baseline with dual-granularity triplet loss for visible-thermal person re-identification. IEEE Signal Process Lett 28:653–657
Qi L, Wang L, Huo J, Shi Y, Gao Y (2021) GreyReID: a novel two-stream deep framework with RGB-grey information for person re-identification. ACM Trans Multimed Comput Commun Appl 17(1):1–22
Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276
Zheng Z, Zheng L, Yang Y (2017) A discriminatively learned cnn embedding for person reidentification. ACM Trans Multimed Comput Commun Appl 14(1):1–20
Zheng Y, Zhou Y, Zhao J, Jian M, Yao R, Liu B, Liu X (2021) A Siamese pedestrian alignment network for person re-identification. Multimed Tools Appl 80:33951–33970
Yang J, Zheng WS, Yang Q, Chen YC, Tian Q (2020) Spatial-temporal graph convolutional network for video-based person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3289–3299
Zhang Z, Zhang H, Liu S (2021) Person re-identification using heterogeneous local graph attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12136–12145
Ahmad S, Scarpellini G, Morerio P, Del Bue A (2022) Event-driven Re-Id: a new benchmark and method towards privacy-preserving person re-identification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 459–468
Zhao B, Li Y, Liu X, Pang HH, Deng RH (2022) FREED: an efficient privacy-preserving solution for person re-identification. In: 2022 IEEE Conference on Dependable and Secure Computing (DSC), pp 1–8. IEEE
Liu X, Yoo C, Xing F, Oh H, El Fakhri G, Kang JW, Woo J et al (2022) Deep unsupervised domain adaptation: a review of recent advances and perspectives. APSIPA Trans Signal Inf Process 11(1)
Tang H, Wang Y, Jia K (2022) Unsupervised domain adaptation via distilled discriminative clustering. Pattern Recognit 127:108638
Prasad M, Balakrishnan R et al (2022) Spatio-Temporal association rule based deep annotation-free clustering (STAR-DAC) for unsupervised person re-identification. Pattern Recognit 122:108287
Zheng Y, Zhou Y, Zhao J, Chen Y, Yao R, Liu B, Saddik AE (2022) Clustering matters: sphere feature for fully unsupervised person re-identification. ACM Trans Multimed Comput Commun Appl 18(4):1–18
Dai Z, Wang G, Yuan W, Zhu S, Tan P (2022) Cluster contrast for unsupervised person re-identification. In: Proceedings of the Asian conference on computer vision, pp 1142–1160
Zhang H, Zhang G, Chen Y, Zheng Y (2022) Global relation-aware contrast learning for unsupervised person re-identification. IEEE Trans Circuits Syst Video Technol 32(12):8599–8610
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28
Yang J, Zhang C, Tang Y, Li Z (2022) PAFM: pose-drive attention fusion mechanism for occluded person re-identification. Neural Comput 34(10):8241–8252
Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 3186–3195
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Zhu Y, Yang W, Wang L, Chen D, Wang M, Wei F, KeZiErBieKe H, Liao Y (2023) Multiscale global-aware channel attention for person re-identification. J Vis Commun Image Represent 90:103714
Wang K, Ding C, Pang J, Xu X (2023) Context sensing attention network for video-based person re-identification. ACM Trans Multimed Comput Commun Appl 19(4):1–20
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Luo H, Wang P, Xu Y, Ding F, Zhou Y, Wang F, Li H, Jin R (2021) Self-supervised pre-training for transformer-based person re-identification. arXiv:2111.12084
Zhu K, Guo H, Yan T, Zhu Y, Wang J, Tang M (2022) PASS: part-aware self-supervised pre-training for person re-identification. In: European conference on computer vision, pp 198–214
Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2898–2907
Tang Z, Zhang R, Peng Z, Chen J, Lin L (2022) Multi-stage spatio-temporal aggregation transformer for video person re-identification. IEEE Trans Multimedia
Hou H, Zhou Y, Zhao J, Yao R, Chen Y, Zheng Y, El Saddik A (2021) Unsupervised cross-domain person re-identification with self-attention and joint-flexible optimization. Image Vis Comput 111:104191
Cheng D, Li J, Kou Q, Zhao K, Liu R (2022) H-net: unsupervised domain adaptation person re-identification network based on hierarchy. Image Vis Comput 104493
Yun X, Wang Q, Cheng X, Song K, Sun Y (2023) Discrepant mutual learning fusion network for unsupervised domain adaptation on person re-identification. Appl Intell 53(3):2951–2966
Chen S, Qiu L, Tian Z, Yan Y, Wang DH, Zhu S (2023) MTNet: mutual tri-training network for unsupervised domain adaptation on person re-identification. J Vis Commun Image Represent 90:103749
Li J, Wang M, Gong X (2023) Transformer based multi-grained features for unsupervised person re-identification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 42–50
Fan H, Zheng L, Yan C, Yang Y (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl 14(4):1–18
Ding G, Khan S, Tang Z, Zhang J, Porikli F (2019) Towards better validity: dispersion based clustering for unsupervised person re-identification. arXiv:1906.01308
Lin Y, Dong X, Zheng L, Yan Y, Yang Y (2019) A bottom-up clustering approach to unsupervised person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence 33:8738–8745
Cho Y, Kim WJ, Hong S, Yoon SE (2022) Part-based pseudo label refinement for unsupervised person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7308–7318
Chen G, Gu T, Lu J, Bao JA, Zhou J (2021) Person re-identification via attention pyramid. IEEE Trans Image Process 30:7663–7676
Lin T, Wang Y, Liu X, Qiu X (2022) A survey of transformers. AI Open
Zhang Q, Yang YB (2021) Rest: an efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485
Guo MH, Liu ZN, Mu TJ, Hu SM (2022) Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell 45(5):5436–5447
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022
Fu D, Chen D, Bao J, Yang H, Yuan L, Zhang L, Li H, Chen D (2021) Unsupervised pre-training for person re-identification. Proceedings of the IEEE conference on computer vision and pattern recognition
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol 96, pp 226–231
Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16519–16529
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision, pp 17–35. Springer
Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88
Sun X, Zheng L (2019) Dissecting person re-identification from the viewpoint of viewpoint. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 608–617
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. arXiv:1610.02984
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. Ieee
Lin Y, Xie L, Wu Y, Yan C, Tian Q (2020) Unsupervised person re-identification via softened similarity learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3390–3399
Wang D, Zhang S (2020) Unsupervised person re-identification via multi-label classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10981–10990
Zeng K, Ning M, Wang Y, Guo Y (2020) Hierarchical clustering with hard-batch triplet loss for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13657–13665
Wang Z, Zhang J, Zheng L, Liu Y, Sun Y, Li Y, Wang S (2020) CycAs: self-supervised Cycle Association for Learning Re-identifiable Descriptions. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp 72–88. Springer
Wu J, Yang Y, Liu H, Liao S, Lei Z, Li SZ (2019) Unsupervised graph association for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8321–8330
Ge Y, Zhu F, Chen D, Zhao R et al (2020) Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv Neural Inf Process Syst 33:11309–11321
Ge Y, Chen D, Li H (2020) Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv:2001.01526
Han X, Yu X, Li G, Zhao J, Pan G, Ye Q, Jiao J, Han Z (2022) Rethinking sampling strategies for unsupervised person re-identification. IEEE Transactions on Image Processing 32:29–42
Chen H, Lagadec B, Bremond F (2021) Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14960–14969
Cheng D, Zhou J, Wang N, Gao X (2022) Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE Trans Image Process 31:3334–3346
Zhong Z, Zheng L, Luo Z, Li S, Yang Y (2019) Invariance matters: exemplar memory for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 598–607
Li M, Zhu X, Gong S (2018) Unsupervised person re-identification by deep learning tracklet association. In: Proceedings of the European conference on computer vision (ECCV), pp 737–753
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Liu X, Liu W, Ma H, Fu H (2016) Large-scale vehicle re-identification in urban surveillance videos. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE
Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 62272461), and by the China Scholarship Council (Grant No. 202206420034) which awarded Yi Zheng a scholarship for 1 year of study abroad at the Agency for Science, Technology and Research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, Y., Zhao, J., Zhou, Y. et al. Multi-level self attention for unsupervised learning person re-identification. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19007-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19007-z