Multi-level self attention for unsupervised learning person re-identification

Zheng, Yi; Zhao, Jiaqi; Zhou, Yong; Liu, Fayao; Yao, Rui; Zhu, Hancheng; El Saddik, Abdulmotaleb

doi:10.1007/s11042-024-19007-z

Multi-level self attention for unsupervised learning person re-identification

Published: 24 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yi Zheng^1,2,
Jiaqi Zhao^1,2,
Yong Zhou^1,2,
Fayao Liu³,
Rui Yao^1,2,
Hancheng Zhu^1,2 &
…
Abdulmotaleb El Saddik⁴

99 Accesses
Explore all metrics

Abstract

In recent years, the task of person re-identification (ReID) has placed a critical demand on accurately describing image features. Attention mechanisms, particularly Transformer-like self-attention (TLSA), have gained favor among researchers due to their outstanding feature descriptive performance. However, due to their intricate structures, TLSA models typically require more computational resources. Simultaneously, contrastive learning has significantly enhanced the performance of unsupervised person re-identification. Nevertheless, contrastive learning originates from deep exploration of relationships among multiple samples, making batch size a crucial factor influencing deep learning methods based on the contrastive learning paradigm. Therefore, under the constraint of limited computational resources, traditional TLSA models often struggle to effectively adapt to unsupervised person ReID methods based on the contrastive learning paradigm. In response to the aforementioned issues, we propose a novel and lightweight Multi-Level Attention (MLA) method in this paper, which effectively mitigates the computational resource conflicts of the TLSA model during training under the contrastive learning paradigm. MLA comprises a lightweight multi-head attention module, complemented by a spatial feature weighting module, and an inter-feature cross-attention module to assist it. By fully leveraging the complementary strengths of these attention mechanisms, our approach achieves significant performance improvements in the ReID task. We evaluated the proposed approach on three large-scale real person ReID datasets, namely Market-1501, DukeMTMC-reID, MSMT17, and the virtual person ReID dataset, PersonX. The experimental results demonstrate that our method outperforms state-of-the-art approaches without relying on supplemental pre-training procedures or additional training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Deep learning models for digital image processing: a review

Article 07 January 2024

A review of object detection based on deep learning

Article 12 June 2020

Availability of data and materials

The datasets used in this study are sourced from the following works: Market-1501 [48], DukeMTMC-reID [49], MSMT17 [50], PersonX [51], and VeRi-776 [68]. The data can be obtained from the download links in the “Prepare Datasets” section at https://github.com/alibaba/cluster-contrast-reid/tree/main.

References

Yan C, Pang G, Bai X, Liu C, Ning X, Gu L, Zhou J (2021) Beyond triplet loss: person re-identification with fine-grained difference-aware pairwise loss. IEEE Trans Multimedia 24:1665–1677
Article Google Scholar
Liu W, Chang X, Chen L, Phung D, Zhang X, Yang Y, Hauptmann AG (2020) Pair-based uncertainty and diversity promoting early active learning for person re-identification. ACM Trans Intell Syst Technol 11(2):1–15
Article Google Scholar
Liu H, Tan X, Zhou X (2020) Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans Multimedia 23:4414–4425
Article Google Scholar
Liu H, Chai Y, Tan X, Li D, Zhou X (2021) Strong but simple baseline with dual-granularity triplet loss for visible-thermal person re-identification. IEEE Signal Process Lett 28:653–657
Article Google Scholar
Qi L, Wang L, Huo J, Shi Y, Gao Y (2021) GreyReID: a novel two-stream deep framework with RGB-grey information for person re-identification. ACM Trans Multimed Comput Commun Appl 17(1):1–22
Article Google Scholar
Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276
Article Google Scholar
Zheng Z, Zheng L, Yang Y (2017) A discriminatively learned cnn embedding for person reidentification. ACM Trans Multimed Comput Commun Appl 14(1):1–20
Article Google Scholar
Zheng Y, Zhou Y, Zhao J, Jian M, Yao R, Liu B, Liu X (2021) A Siamese pedestrian alignment network for person re-identification. Multimed Tools Appl 80:33951–33970
Article Google Scholar
Yang J, Zheng WS, Yang Q, Chen YC, Tian Q (2020) Spatial-temporal graph convolutional network for video-based person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3289–3299
Zhang Z, Zhang H, Liu S (2021) Person re-identification using heterogeneous local graph attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12136–12145
Ahmad S, Scarpellini G, Morerio P, Del Bue A (2022) Event-driven Re-Id: a new benchmark and method towards privacy-preserving person re-identification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 459–468
Zhao B, Li Y, Liu X, Pang HH, Deng RH (2022) FREED: an efficient privacy-preserving solution for person re-identification. In: 2022 IEEE Conference on Dependable and Secure Computing (DSC), pp 1–8. IEEE
Liu X, Yoo C, Xing F, Oh H, El Fakhri G, Kang JW, Woo J et al (2022) Deep unsupervised domain adaptation: a review of recent advances and perspectives. APSIPA Trans Signal Inf Process 11(1)
Tang H, Wang Y, Jia K (2022) Unsupervised domain adaptation via distilled discriminative clustering. Pattern Recognit 127:108638
Article Google Scholar
Prasad M, Balakrishnan R et al (2022) Spatio-Temporal association rule based deep annotation-free clustering (STAR-DAC) for unsupervised person re-identification. Pattern Recognit 122:108287
Article Google Scholar
Zheng Y, Zhou Y, Zhao J, Chen Y, Yao R, Liu B, Saddik AE (2022) Clustering matters: sphere feature for fully unsupervised person re-identification. ACM Trans Multimed Comput Commun Appl 18(4):1–18
Article Google Scholar
Dai Z, Wang G, Yuan W, Zhu S, Tan P (2022) Cluster contrast for unsupervised person re-identification. In: Proceedings of the Asian conference on computer vision, pp 1142–1160
Zhang H, Zhang G, Chen Y, Zheng Y (2022) Global relation-aware contrast learning for unsupervised person re-identification. IEEE Trans Circuits Syst Video Technol 32(12):8599–8610
Article Google Scholar
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28
Yang J, Zhang C, Tang Y, Li Z (2022) PAFM: pose-drive attention fusion mechanism for occluded person re-identification. Neural Comput 34(10):8241–8252
Article Google Scholar
Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 3186–3195
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Zhu Y, Yang W, Wang L, Chen D, Wang M, Wei F, KeZiErBieKe H, Liao Y (2023) Multiscale global-aware channel attention for person re-identification. J Vis Commun Image Represent 90:103714
Article Google Scholar
Wang K, Ding C, Pang J, Xu X (2023) Context sensing attention network for video-based person re-identification. ACM Trans Multimed Comput Commun Appl 19(4):1–20
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Luo H, Wang P, Xu Y, Ding F, Zhou Y, Wang F, Li H, Jin R (2021) Self-supervised pre-training for transformer-based person re-identification. arXiv:2111.12084
Zhu K, Guo H, Yan T, Zhu Y, Wang J, Tang M (2022) PASS: part-aware self-supervised pre-training for person re-identification. In: European conference on computer vision, pp 198–214
Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2898–2907
Tang Z, Zhang R, Peng Z, Chen J, Lin L (2022) Multi-stage spatio-temporal aggregation transformer for video person re-identification. IEEE Trans Multimedia
Hou H, Zhou Y, Zhao J, Yao R, Chen Y, Zheng Y, El Saddik A (2021) Unsupervised cross-domain person re-identification with self-attention and joint-flexible optimization. Image Vis Comput 111:104191
Article Google Scholar
Cheng D, Li J, Kou Q, Zhao K, Liu R (2022) H-net: unsupervised domain adaptation person re-identification network based on hierarchy. Image Vis Comput 104493
Yun X, Wang Q, Cheng X, Song K, Sun Y (2023) Discrepant mutual learning fusion network for unsupervised domain adaptation on person re-identification. Appl Intell 53(3):2951–2966
Article Google Scholar
Chen S, Qiu L, Tian Z, Yan Y, Wang DH, Zhu S (2023) MTNet: mutual tri-training network for unsupervised domain adaptation on person re-identification. J Vis Commun Image Represent 90:103749
Article Google Scholar
Li J, Wang M, Gong X (2023) Transformer based multi-grained features for unsupervised person re-identification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 42–50
Fan H, Zheng L, Yan C, Yang Y (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl 14(4):1–18
Article Google Scholar
Ding G, Khan S, Tang Z, Zhang J, Porikli F (2019) Towards better validity: dispersion based clustering for unsupervised person re-identification. arXiv:1906.01308
Lin Y, Dong X, Zheng L, Yan Y, Yang Y (2019) A bottom-up clustering approach to unsupervised person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence 33:8738–8745
Article Google Scholar
Cho Y, Kim WJ, Hong S, Yoon SE (2022) Part-based pseudo label refinement for unsupervised person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7308–7318
Chen G, Gu T, Lu J, Bao JA, Zhou J (2021) Person re-identification via attention pyramid. IEEE Trans Image Process 30:7663–7676
Article Google Scholar
Lin T, Wang Y, Liu X, Qiu X (2022) A survey of transformers. AI Open
Zhang Q, Yang YB (2021) Rest: an efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485
Google Scholar
Guo MH, Liu ZN, Mu TJ, Hu SM (2022) Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell 45(5):5436–5447
Google Scholar
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022
Fu D, Chen D, Bao J, Yang H, Yuan L, Zhang L, Li H, Chen D (2021) Unsupervised pre-training for person re-identification. Proceedings of the IEEE conference on computer vision and pattern recognition
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol 96, pp 226–231
Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16519–16529
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision, pp 17–35. Springer
Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88
Sun X, Zheng L (2019) Dissecting person re-identification from the viewpoint of viewpoint. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 608–617
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. arXiv:1610.02984
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. Ieee
Lin Y, Xie L, Wu Y, Yan C, Tian Q (2020) Unsupervised person re-identification via softened similarity learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3390–3399
Wang D, Zhang S (2020) Unsupervised person re-identification via multi-label classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10981–10990
Zeng K, Ning M, Wang Y, Guo Y (2020) Hierarchical clustering with hard-batch triplet loss for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13657–13665
Wang Z, Zhang J, Zheng L, Liu Y, Sun Y, Li Y, Wang S (2020) CycAs: self-supervised Cycle Association for Learning Re-identifiable Descriptions. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp 72–88. Springer
Wu J, Yang Y, Liu H, Liao S, Lei Z, Li SZ (2019) Unsupervised graph association for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8321–8330
Ge Y, Zhu F, Chen D, Zhao R et al (2020) Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv Neural Inf Process Syst 33:11309–11321
Google Scholar
Ge Y, Chen D, Li H (2020) Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv:2001.01526
Han X, Yu X, Li G, Zhao J, Pan G, Ye Q, Jiao J, Han Z (2022) Rethinking sampling strategies for unsupervised person re-identification. IEEE Transactions on Image Processing 32:29–42
Article Google Scholar
Chen H, Lagadec B, Bremond F (2021) Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14960–14969
Cheng D, Zhou J, Wang N, Gao X (2022) Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE Trans Image Process 31:3334–3346
Article Google Scholar
Zhong Z, Zheng L, Luo Z, Li S, Yang Y (2019) Invariance matters: exemplar memory for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 598–607
Li M, Zhu X, Gong S (2018) Unsupervised person re-identification by deep learning tracklet association. In: Proceedings of the European conference on computer vision (ECCV), pp 737–753
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Liu X, Liu W, Ma H, Fu H (2016) Large-scale vehicle re-identification in urban surveillance videos. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62272461), and by the China Scholarship Council (Grant No. 202206420034) which awarded Yi Zheng a scholarship for 1 year of study abroad at the Agency for Science, Technology and Research.

Author information

Authors and Affiliations

Engineering Research Center of Mine Digitization of the Ministry of Education of the People’s Republic of China, China University of Mining and Technology, No. 1, Daxue Road, Xuzhou, Jiangsu, 221116, People’s Republic of China
Yi Zheng, Jiaqi Zhao, Yong Zhou, Rui Yao & Hancheng Zhu
School of Computer Science and Technology, China University of Mining and Technology, No. 1, Daxue Road, Xuzhou, Jiangsu, 221116, People’s Republic of China
Yi Zheng, Jiaqi Zhao, Yong Zhou, Rui Yao & Hancheng Zhu
Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore, Singapore
Fayao Liu
Multimedia Communications Research Laboratory (MCRLab), University of Ottawa, 800 King Edward, Ottawa, Ontario, K1N 6N5, Canada
Abdulmotaleb El Saddik

Authors

Yi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Fayao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Yao
View author publications
You can also search for this author in PubMed Google Scholar
Hancheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Abdulmotaleb El Saddik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Zhou.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zheng, Y., Zhao, J., Zhou, Y. et al. Multi-level self attention for unsupervised learning person re-identification. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19007-z

Download citation

Received: 05 January 2024
Revised: 28 February 2024
Accepted: 19 March 2024
Published: 24 April 2024
DOI: https://doi.org/10.1007/s11042-024-19007-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-level self attention for unsupervised learning person re-identification

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Deep learning models for digital image processing: a review

A review of object detection based on deep learning

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-level self attention for unsupervised learning person re-identification

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Deep learning models for digital image processing: a review

A review of object detection based on deep learning

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation