Abstract
Most unsupervised person re-identification (ReID) approaches combine clustering-based pseudo-label prediction with feature learning, and perform the two steps in an alternating fashion for training ReID models. However, incorrect/noisy pseudo-labels are often present due to various variations (e.g., human pose, illumination, and viewpoint, etc.). Such noisy pseudo-labels may harm the trained ReID models. In order to use diverse variations/information while minimizing negative influence of the noisy pseudo-labels, we propose a confidence-adapted meta-interaction (CAMI) method by explicitly exploring the interaction between the believable supervision (reliable pseudo-labels) and the diverse information. Specifically, CAMI iteratively trains the ReID model in a meta-learning manner, in which the training images are dynamically divided into a reliable set and an unreliable set. At each iteration, the pseudo-labels of images are predicted by clustering and the training images are divided by the proposed confidence-adapted sample disentanglement (CASD) method. To adapt the changes of the pseudo-labels and gradually refine the division, the CASD method dynamically predicts the pseudo-label confidence. It divides the training images into the reliable set (with high confidence pseudo-labels) and the unreliable set (with low confidence pseudo-labels), respectively. Then a meta-interaction method is proposed for training the ReID model, which consists of a meta-training step to use the believable supervision of the reliable set and a meta-testing step to use the diverse information of the unreliable set. Meanwhile, a bridge model is dynamically built to refine the unreliable set based on the believable supervision from the reliable set. The CAMI is evaluated by two unsupervised person ReID settings, including the image-based and the video-based. The experimental results on four datasets demonstrate the superiority of the proposed CAMI.
Similar content being viewed by others
References
Karanam S, Gou M, Wu Z, Rates-Borras A, Camps OI, Radke RJ (2019) A systematic evaluation and benchmark for person re-identification: Features, metrics, and datasets. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(3):523–536
Song W, Zheng J, Wu Y, Chen C, Liu F (2021) Discriminative feature extraction for video person re-identification via multi-task network. Appl Intell 51(2):788–803
Pang Z, Guo J, Sun W, Xiao Y, Ming Yu (2022) Cross-domain person re-identification by hybrid supervised and unsupervised learning. Appl Intell 52(3):2987–3001
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision. Springer, pp 17–35
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In European Conference on Computer Vision. Springer, pp 868–884
Yang J, Zheng W-S, Yang Q, Chen Y-C, Tian Qi (2020) Spatial-temporal graph convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 3289–3299
Lin Y, Xie L, Wu Y, Yan C, Tian Q (2020) Unsupervised person re-identification via softened similarity learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 3390–3399
Yang F, Zhong Z, Luo Z, Cai Y, Lin Y, Li S, Sebe N (2021) Joint noise-tolerant learning and meta camera shift adaptation for unsupervised person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4855– 4864
Yu J, Oh H (2022) Graph-structure based multi-label prediction and classification for unsupervised person re-identification. Appl Intell 1–13
Lin Y, Dong X, Zheng L, Yan Y, Yang Y (2019) A bottom-up clustering approach to unsupervised person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, vol 33. pp 8738–8745
Zeng K, Ning M, Wang Y, Guo Y (2020) Hierarchical clustering with hard-batch triplet loss for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 13657–13665
Zhao F, Liao S, Xie G-S, Zhao J, Zhang K, Shao L (2020) Unsupervised domain adaptation with noise resistible mutual-training for person re-identification. In European Conference on Computer Vision. Springer, pp 526–544
Ge Y, Chen D, Li H (2020) Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. In International Conference on Learning Representations
Zheng K, Lan C, Zeng W, Zhang Z, Zha Z-J (2021) Exploiting sample uncertainty for domain adaptive person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, vol 35. pp 3538–3546
Zheng Y, Zhou Y, Zhao J, Chen Y, Yao R, Liu B, El Saddik A (2022) Clustering matters: Sphere feature for fully unsupervised person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18(4):1– 18
Yi L, Liu S, She Q, McLeod AI, Wang B (2022) On learning contrastive representations for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 16682–16691
Ji Z, Zou X, Lin X, Liu X, Huang T, Wu S (2020) An attention-driven two-stage clustering method for unsupervised person re-identification. In European Conference on Computer Vision. Springer, pp 20–36
Xu S, Luo L, Hu J, Yang B, Hu S (2022) Semantic driven attention network with attribute learning for unsupervised person re-identification. Knowl-Based Syst 252:109354
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. pp 1126–1135. PMLR
Zhong Z, Zheng L, Luo Z, Li S, Yang Y (2019) Invariance matters: Exemplar memory for domain adaptive person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 598–607
Zhang H, Cao H, Yang X, Deng C, Tao D (2021) Self-training with progressive representation enhancement for unsupervised cross-domain person re-identification. IEEE Trans Image Process 30:5287–5298
Sun J, Li Y, Chen H, Peng Y, Zhu J (2021) Unsupervised cross domain person re-identification by multi-loss optimization learning. IEEE Trans Image Process 30:2935–2946
Wei P, Zhang C, Tang Y, Li Z, Wang Z (2022) Reinforced domain adaptation with attention and adversarial learning for unsupervised person re-id. Appl Intell 1–15
Ge Y, Zhu F, Chen D, Zhao R et al (2020) Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Advances in Neural Information Processing Systems 33:11309–11321
Wang M, Lai B, Huang J, Gong X, Hua X-S (2021) Camera-aware proxies for unsupervised person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence 35:2764–2772
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1318–1327
Sun Q, Liu Y, Chua T-S, Schiele B (2019) Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 403–412
Guo J, Zhu X, Zhao C, Cao D, Lei Z, Li SZ (2020) Learning meta face recognition in unseen domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 6163–6172
Li W, Wang S, Lu J, Feng J, Zhou J (2021) Meta-mining discriminative samples for kinship verification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 16135–16144
Zhao Y, Zhong Z, Yang F, Luo Z, Lin Y, Li S, Sebe N (2021) Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 6277–6286
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd 96:226–231
Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In International conference on machine learning. pp 1597–1607. PMLR
Li D, Yang Y, Song Y-Z, Hospedales TM (2018) Learning to generalize: Meta-learning for domain generalization. In Thirty-Second AAAI Conference on Artificial Intelligence
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision. pp 1116–1124
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 5177–5186
Li M, Zhu X, Gong S (2019) Unsupervised tracklet person re-identification. IEEE transactions on pattern analysis and machine intelligence 42(7):1770–1782
Wu J, Yang Y, Liu H, Liao S, Lei Z, Li SZ (2019) Unsupervised graph association for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. pp 8321–8330
Liu M, Qu L, Nie L, Liu M, Duan L, Chen B (2020) Iterative local-global collaboration learning towards one-shot video person re-identification. IEEE Trans Image Process 29:9360–9372
Nikhal K, Riggan BS (2022) Multi-context grouped attention for unsupervised person re-identification. IEEE Transactions on Biometrics, Behavior, and Identity Science
Xie P, Xu X, Wang Z, Yamasaki T (2022) Sampling and re-weighting: Towards diverse frame aware unsupervised video person re-identification. IEEE Transactions on Multimedia
Pang B, Zhai D, Jiang J, Liu X (2022) Fully unsupervised person re-identification via selective contrastive learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18(2):1–15
Zeng S, Wang X, Liu M, Liu Q, Wang Y (2022) Anchor association learning for unsupervised video person re-identification. IEEE Transactions on Neural Networks and Learning Systems
Wang D, Zhang S (2020) Unsupervised person re-identification via multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 10981–10990
Qi L, Wang L, Huo J, Shi Y, Geng X, Gao Y (2021) Adversarial camera alignment network for unsupervised cross-camera person re-identification. IEEE Transactions on Circuits and Systems for Video Technology
Prasad MVNK, Balakrishnan R et al (2022) Spatio-temporal association rule based deep annotation-free clustering (star-dac) for unsupervised person re-identification. Pattern Recogn 122:108287
Li Q, Peng X, Qiao Y, Hao Q (2022) Unsupervised person re-identification with multi-label learning guided self-paced clustering. Pattern Recogn 125:108521
Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. pp 407–416
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proceedings of the national academy of sciences 105(4):1118–1123
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant U2034211 and Grant 62006017.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
In this section, we give further analysis of the CAMI. There are three aspects. (1) Evaluation of the weighted contrastive loss and the mean teacher model. (2) Evaluation on other clustering approaches. (3) t-SNE visualization for person image features.
1.1 Further ablation study
Effectiveness of the weighted contrastive loss. To evaluate the impact of the weight in the weighted contrastive loss, we remove the weight from the weighted contrastive loss when training, i.e., ‘CAMI w/o weight’. As shown in Table 4, CAMI shows better performance than ‘CAMI w/o weight’. Specifically, it brings 3.0% and 2.9% improvements in mAP, and 2.3% and 0.8% in Rank-1 accuracy on both datasets, respectively. The improvement is because the weighted contrastive loss considers the reliability of the pseudo-labels and thus can further relieve the negative influence of the noisy pseudo-labels.
Effectiveness of the mean teacher. As a basic component in our proposed framework, the mean teacher model is also experimentally evaluated. As shown in Table 4, we observe consistent performance decline on both datasets when removing the mean teacher model. The mean teacher model and the weighted contrastive loss help us to build a strong unsupervised person ReID framework.
1.2 Evaluation on other clustering methods
To verify the effectiveness of the proposed CAMI on different clustering algorithms, we conduct experiments by replacing the DBSCAN algorithm with two other clustering approaches, including agglomerative clustering (AC) [48] and informap [49]. The results are reported in Table 5. Compared to the baseline, we observe the consistent superiority of the CAMI on all clustering methods. DBSCAN achieves better performance among these clustering approaches, and we adopt it as our pseudo-label generator as existing works [8].
We also evaluate the pseudo-labels predicted by different clustering methods using normalized mutual information (NMI). The NMI scores are shown in Fig. 10. As the training goes on, we can see that the NMI scores steadily rise. It means that the accuracy of the predicted pseudo-labels gradually increases. Furthermore, compared to the baseline, the CAMI consistently achieves higher NMI scores on different clustering approaches.
1.3 t-SNE visualization for person image features
We also visualize person image features of the baseline and CAMI by using t-SNE visualization. We sample 10 different identities from market-1501, each containing a variable number of images. The visualized results are exhibited in Fig. 11. It can be noticed that CAMI shows better intra-class compactness and inter-class separation than the baseline does. It suggests that CAMI can train more robust ReID model.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Li, Q., Xue, W. et al. Confidence-adapted meta-interaction for unsupervised person re-identification. Appl Intell 53, 25525–25542 (2023). https://doi.org/10.1007/s10489-023-04863-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04863-3