Skip to main content
Log in

Multi-level self attention for unsupervised learning person re-identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, the task of person re-identification (ReID) has placed a critical demand on accurately describing image features. Attention mechanisms, particularly Transformer-like self-attention (TLSA), have gained favor among researchers due to their outstanding feature descriptive performance. However, due to their intricate structures, TLSA models typically require more computational resources. Simultaneously, contrastive learning has significantly enhanced the performance of unsupervised person re-identification. Nevertheless, contrastive learning originates from deep exploration of relationships among multiple samples, making batch size a crucial factor influencing deep learning methods based on the contrastive learning paradigm. Therefore, under the constraint of limited computational resources, traditional TLSA models often struggle to effectively adapt to unsupervised person ReID methods based on the contrastive learning paradigm. In response to the aforementioned issues, we propose a novel and lightweight Multi-Level Attention (MLA) method in this paper, which effectively mitigates the computational resource conflicts of the TLSA model during training under the contrastive learning paradigm. MLA comprises a lightweight multi-head attention module, complemented by a spatial feature weighting module, and an inter-feature cross-attention module to assist it. By fully leveraging the complementary strengths of these attention mechanisms, our approach achieves significant performance improvements in the ReID task. We evaluated the proposed approach on three large-scale real person ReID datasets, namely Market-1501, DukeMTMC-reID, MSMT17, and the virtual person ReID dataset, PersonX. The experimental results demonstrate that our method outperforms state-of-the-art approaches without relying on supplemental pre-training procedures or additional training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

The datasets used in this study are sourced from the following works: Market-1501 [48], DukeMTMC-reID [49], MSMT17 [50], PersonX [51], and VeRi-776 [68]. The data can be obtained from the download links in the “Prepare Datasets” section at https://github.com/alibaba/cluster-contrast-reid/tree/main.

References

  1. Yan C, Pang G, Bai X, Liu C, Ning X, Gu L, Zhou J (2021) Beyond triplet loss: person re-identification with fine-grained difference-aware pairwise loss. IEEE Trans Multimedia 24:1665–1677

    Article  Google Scholar 

  2. Liu W, Chang X, Chen L, Phung D, Zhang X, Yang Y, Hauptmann AG (2020) Pair-based uncertainty and diversity promoting early active learning for person re-identification. ACM Trans Intell Syst Technol 11(2):1–15

    Article  Google Scholar 

  3. Liu H, Tan X, Zhou X (2020) Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification. IEEE Trans Multimedia 23:4414–4425

    Article  Google Scholar 

  4. Liu H, Chai Y, Tan X, Li D, Zhou X (2021) Strong but simple baseline with dual-granularity triplet loss for visible-thermal person re-identification. IEEE Signal Process Lett 28:653–657

    Article  Google Scholar 

  5. Qi L, Wang L, Huo J, Shi Y, Gao Y (2021) GreyReID: a novel two-stream deep framework with RGB-grey information for person re-identification. ACM Trans Multimed Comput Commun Appl 17(1):1–22

    Article  Google Scholar 

  6. Yang X, Liu L, Wang N, Gao X (2021) A two-stream dynamic pyramid representation model for video-based person re-identification. IEEE Trans Image Process 30:6266–6276

    Article  Google Scholar 

  7. Zheng Z, Zheng L, Yang Y (2017) A discriminatively learned cnn embedding for person reidentification. ACM Trans Multimed Comput Commun Appl 14(1):1–20

    Article  Google Scholar 

  8. Zheng Y, Zhou Y, Zhao J, Jian M, Yao R, Liu B, Liu X (2021) A Siamese pedestrian alignment network for person re-identification. Multimed Tools Appl 80:33951–33970

    Article  Google Scholar 

  9. Yang J, Zheng WS, Yang Q, Chen YC, Tian Q (2020) Spatial-temporal graph convolutional network for video-based person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3289–3299

  10. Zhang Z, Zhang H, Liu S (2021) Person re-identification using heterogeneous local graph attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12136–12145

  11. Ahmad S, Scarpellini G, Morerio P, Del Bue A (2022) Event-driven Re-Id: a new benchmark and method towards privacy-preserving person re-identification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 459–468

  12. Zhao B, Li Y, Liu X, Pang HH, Deng RH (2022) FREED: an efficient privacy-preserving solution for person re-identification. In: 2022 IEEE Conference on Dependable and Secure Computing (DSC), pp 1–8. IEEE

  13. Liu X, Yoo C, Xing F, Oh H, El Fakhri G, Kang JW, Woo J et al (2022) Deep unsupervised domain adaptation: a review of recent advances and perspectives. APSIPA Trans Signal Inf Process 11(1)

  14. Tang H, Wang Y, Jia K (2022) Unsupervised domain adaptation via distilled discriminative clustering. Pattern Recognit 127:108638

    Article  Google Scholar 

  15. Prasad M, Balakrishnan R et al (2022) Spatio-Temporal association rule based deep annotation-free clustering (STAR-DAC) for unsupervised person re-identification. Pattern Recognit 122:108287

    Article  Google Scholar 

  16. Zheng Y, Zhou Y, Zhao J, Chen Y, Yao R, Liu B, Saddik AE (2022) Clustering matters: sphere feature for fully unsupervised person re-identification. ACM Trans Multimed Comput Commun Appl 18(4):1–18

    Article  Google Scholar 

  17. Dai Z, Wang G, Yuan W, Zhu S, Tan P (2022) Cluster contrast for unsupervised person re-identification. In: Proceedings of the Asian conference on computer vision, pp 1142–1160

  18. Zhang H, Zhang G, Chen Y, Zheng Y (2022) Global relation-aware contrast learning for unsupervised person re-identification. IEEE Trans Circuits Syst Video Technol 32(12):8599–8610

    Article  Google Scholar 

  19. Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Inf Process Syst 28

  20. Yang J, Zhang C, Tang Y, Li Z (2022) PAFM: pose-drive attention fusion mechanism for occluded person re-identification. Neural Comput 34(10):8241–8252

    Article  Google Scholar 

  21. Zhang Z, Lan C, Zeng W, Jin X, Chen Z (2020) Relation-aware global attention for person re-identification. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp 3186–3195

  22. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  23. Zhu Y, Yang W, Wang L, Chen D, Wang M, Wei F, KeZiErBieKe H, Liao Y (2023) Multiscale global-aware channel attention for person re-identification. J Vis Commun Image Represent 90:103714

    Article  Google Scholar 

  24. Wang K, Ding C, Pang J, Xu X (2023) Context sensing attention network for video-based person re-identification. ACM Trans Multimed Comput Commun Appl 19(4):1–20

    Google Scholar 

  25. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  26. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929

  27. Luo H, Wang P, Xu Y, Ding F, Zhou Y, Wang F, Li H, Jin R (2021) Self-supervised pre-training for transformer-based person re-identification. arXiv:2111.12084

  28. Zhu K, Guo H, Yan T, Zhu Y, Wang J, Tang M (2022) PASS: part-aware self-supervised pre-training for person re-identification. In: European conference on computer vision, pp 198–214

  29. Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2898–2907

  30. Tang Z, Zhang R, Peng Z, Chen J, Lin L (2022) Multi-stage spatio-temporal aggregation transformer for video person re-identification. IEEE Trans Multimedia

  31. Hou H, Zhou Y, Zhao J, Yao R, Chen Y, Zheng Y, El Saddik A (2021) Unsupervised cross-domain person re-identification with self-attention and joint-flexible optimization. Image Vis Comput 111:104191

    Article  Google Scholar 

  32. Cheng D, Li J, Kou Q, Zhao K, Liu R (2022) H-net: unsupervised domain adaptation person re-identification network based on hierarchy. Image Vis Comput 104493

  33. Yun X, Wang Q, Cheng X, Song K, Sun Y (2023) Discrepant mutual learning fusion network for unsupervised domain adaptation on person re-identification. Appl Intell 53(3):2951–2966

    Article  Google Scholar 

  34. Chen S, Qiu L, Tian Z, Yan Y, Wang DH, Zhu S (2023) MTNet: mutual tri-training network for unsupervised domain adaptation on person re-identification. J Vis Commun Image Represent 90:103749

    Article  Google Scholar 

  35. Li J, Wang M, Gong X (2023) Transformer based multi-grained features for unsupervised person re-identification. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 42–50

  36. Fan H, Zheng L, Yan C, Yang Y (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl 14(4):1–18

    Article  Google Scholar 

  37. Ding G, Khan S, Tang Z, Zhang J, Porikli F (2019) Towards better validity: dispersion based clustering for unsupervised person re-identification. arXiv:1906.01308

  38. Lin Y, Dong X, Zheng L, Yan Y, Yang Y (2019) A bottom-up clustering approach to unsupervised person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence 33:8738–8745

    Article  Google Scholar 

  39. Cho Y, Kim WJ, Hong S, Yoon SE (2022) Part-based pseudo label refinement for unsupervised person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7308–7318

  40. Chen G, Gu T, Lu J, Bao JA, Zhou J (2021) Person re-identification via attention pyramid. IEEE Trans Image Process 30:7663–7676

    Article  Google Scholar 

  41. Lin T, Wang Y, Liu X, Qiu X (2022) A survey of transformers. AI Open

  42. Zhang Q, Yang YB (2021) Rest: an efficient transformer for visual recognition. Adv Neural Inf Process Syst 34:15475–15485

    Google Scholar 

  43. Guo MH, Liu ZN, Mu TJ, Hu SM (2022) Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell 45(5):5436–5447

    Google Scholar 

  44. He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022

  45. Fu D, Chen D, Bao J, Yang H, Yuan L, Zhang L, Li H, Chen D (2021) Unsupervised pre-training for person re-identification. Proceedings of the IEEE conference on computer vision and pattern recognition

  46. Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol 96, pp 226–231

  47. Srinivas A, Lin TY, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16519–16529

  48. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124

  49. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision, pp 17–35. Springer

  50. Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88

  51. Sun X, Zheng L (2019) Dissecting person re-identification from the viewpoint of viewpoint. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 608–617

  52. Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. arXiv:1610.02984

  53. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  54. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. Ieee

  55. Lin Y, Xie L, Wu Y, Yan C, Tian Q (2020) Unsupervised person re-identification via softened similarity learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3390–3399

  56. Wang D, Zhang S (2020) Unsupervised person re-identification via multi-label classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10981–10990

  57. Zeng K, Ning M, Wang Y, Guo Y (2020) Hierarchical clustering with hard-batch triplet loss for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13657–13665

  58. Wang Z, Zhang J, Zheng L, Liu Y, Sun Y, Li Y, Wang S (2020) CycAs: self-supervised Cycle Association for Learning Re-identifiable Descriptions. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp 72–88. Springer

  59. Wu J, Yang Y, Liu H, Liao S, Lei Z, Li SZ (2019) Unsupervised graph association for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8321–8330

  60. Ge Y, Zhu F, Chen D, Zhao R et al (2020) Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv Neural Inf Process Syst 33:11309–11321

    Google Scholar 

  61. Ge Y, Chen D, Li H (2020) Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv:2001.01526

  62. Han X, Yu X, Li G, Zhao J, Pan G, Ye Q, Jiao J, Han Z (2022) Rethinking sampling strategies for unsupervised person re-identification. IEEE Transactions on Image Processing 32:29–42

    Article  Google Scholar 

  63. Chen H, Lagadec B, Bremond F (2021) Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14960–14969

  64. Cheng D, Zhou J, Wang N, Gao X (2022) Hybrid dynamic contrast and probability distillation for unsupervised person re-id. IEEE Trans Image Process 31:3334–3346

    Article  Google Scholar 

  65. Zhong Z, Zheng L, Luo Z, Li S, Yang Y (2019) Invariance matters: exemplar memory for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 598–607

  66. Li M, Zhu X, Gong S (2018) Unsupervised person re-identification by deep learning tracklet association. In: Proceedings of the European conference on computer vision (ECCV), pp 737–753

  67. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  68. Liu X, Liu W, Ma H, Fu H (2016) Large-scale vehicle re-identification in urban surveillance videos. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62272461), and by the China Scholarship Council (Grant No. 202206420034) which awarded Yi Zheng a scholarship for 1 year of study abroad at the Agency for Science, Technology and Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Zhou.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Y., Zhao, J., Zhou, Y. et al. Multi-level self attention for unsupervised learning person re-identification. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19007-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19007-z

Keywords

Navigation