Abstract
Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification (Re-ID). Re-ID models suffer from varying degrees of background interference caused by continuous scene variations. The recently proposed segment anything model (SAM) has demonstrated exceptional performance in zero-shot segmentation tasks. The combination of SAM and vehicle Re-ID models can achieve efficient separation of vehicle identity and background information. This paper proposes a method that combines SAM-driven mask autoencoder (MAE) pre-training and background-aware meta-learning for unsupervised vehicle Re-ID. The method consists of three sub-modules. First, the segmentation capacity of SAM is utilized to separate the vehicle identity region from the background. SAM cannot be robustly employed in exceptional situations, such as those with ambiguity or occlusion. Thus, in vehicle Re-ID downstream tasks, a spatially-constrained vehicle background segmentation method is presented to obtain accurate background segmentation results. Second, SAM-driven MAE pre-training utilizes the aforementioned segmentation results to select patches belonging to the vehicle and to mask other patches, allowing MAE to learn identity-sensitive features in a self-supervised manner. Finally, we present a background-aware meta-learning method to fit varying degrees of background interference in different scenarios by combining different background region ratios. Our experiments demonstrate that the proposed method has state-of-the-art performance in reducing background interference variations.
![](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs41095-024-0424-2/MediaObjects/41095_2024_424_Fig1_HTML.jpg)
Article PDF
Explore related subjects
Find the latest articles, discoveries, and news in related topics.Avoid common mistakes on your manuscript.
Availability of data and materials
The data presented in this study are available on request from the corresponding author.
References
Lei, J.; Qin, T.; Peng, B.; Li, W.; Pan, Z.; Shen, H.; Kwong, S. Reducing background induced domain shift for adaptive person re-identification. IEEE Transactions on Industrial Informatics Vol. 19, No. 6, 7377–7388, 2023.
Zhang, G.; Zhang, H.; Lin, W.; Chandran, A. K.; Jing, X. Camera contrast learning for unsupervised person re-identification. IEEE Transactions on Circuits and Systems for Video Technology Vol. 33, No. 8, 4096–4107, 2023.
Zhu, K.; Guo, H.; Liu, S.; Wang, J.; Tang, M. Learning semantics-consistent stripes with self-refinement for person re-identification. IEEE Transactions on Neural Networks and Learning Systems Vol. 34, No. 11, 8531–8542, 2023.
Wu, M.; Zhang, Y.; Zhang, T.; Zhang, W. Background segmentation for vehicle re-identification. In: MultiMedia Modeling. Lecture Notes in Computer Science, Vol. 11962. Springer Cham, 88–99, 2020.
Munir, A.; Martinel, N.; Micheloni, C. Oriented splits network to distill background for vehicle reidentification. In: Proceedings of the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance, 1–8, 2021.
Lu, Z.; Lin, R.; Hu, H. MART: Mask-aware reasoning transformer for vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems Vol. 24, No. 2, 1994–2009, 2023.
Ning, X.; Gong, K.; Li, W.; Zhang, L.; Bai, X.; Tian, S. Feature refinement and filter network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology Vol. 31, No. 9, 3391–3402, 2021.
Li, Z.; Deng, Y.; Tang, Z.; Huang, J. SFMNet: Self-guided feature mining network for vehicle reidentification. In: Proceedings of the International Joint Conference on Neural Networks, 1–8, 2023.
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollar, P.; Girshick, R. Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15979–15988, 2022.
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A. C.; Lo, W. Y.; et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
Lin, Y.; Wu, Y.; Yan, C.; Xu, M.; Yang, Y. Unsupervised person re-identification via cross-camera similarity exploration. IEEE Transactions on Image Processing Vol. 29, 5481–5490, 2020.
Wang, H.; Lu, J.; Pang, F.; Zhou, J.; Zhang, K. Bi-directional style adaptation network for person re-identification. IEEE Sensors Journal Vol. 22, No. 12, 12339–12347, 2022.
Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. VERI-wild: A large dataset and a new method for vehicle re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3230–3238, 2019.
Kamenou, E.; Del Rincon, J. M.; Miller, P.; Devlin-Hill, P. A meta-learning approach for domain generalisation across visual modalities in vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 385–393, 2023.
Zhang, L.; Liu, Z.; Zhang, W.; Zhang, D. Style uncertainty based self-paced meta learning for generalizable person re-identification. IEEE Transactions on Image Processing Vol. 32, 2107–2119, 2023.
Ni, H.; Song, J.; Luo, X.; Zheng, F.; Li, W.; Shen, H. T. Meta distribution alignment for generalizable person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2477–2486, 2022.
Zheng, Z.; Ruan, T.; Wei, Y.; Yang, Y.; Mei, T. VehicleNet: Learning robust visual representation for vehicle re-identification. IEEE Transactions on Multimedia Vol. 23, 2683–2693, 2020.
Yu, J.; Oh, H. Unsupervised vehicle re-identification via self-supervised metric learning using feature dictionary. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 3806–3813, 2021.
Lu, Z.; Lin, R.; He, Q.; Hu, H. Mask-aware pseudo label denoising for unsupervised vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems Vol. 24, No. 4, 4333–4347, 2023.
He, Z.; Zhao, H.; Wang, J.; Feng, W. Multilevel progressive learning for unsupervised vehicle re-identification. IEEE Transactions on Vehicular Technology Vol. 72, No. 4, 4357–4371, 2023.
Wang, P.; Ding, C.; Tan, W.; Gong, M.; Jia, K.; Tao, D. Uncertainty-aware clustering for unsupervised domain adaptive object re-identification. IEEE Transactions on Multimedia Vol. 25, 2624–2635, 2022.
Dai, P.; Chen, P.; Wu, Q.; Hong, X.; Ye, Q.; Tian, Q.; Lin, C. W.; Ji, R. Disentangling task-oriented representations for unsupervised domain adaptation. IEEE Transactions on Image Processing Vol. 31, 1012–1026, 2022.
Wei, R.; Gu, J.; He, S.; Jiang, W. Transformer-based domain-specific representation for unsupervised domain adaptive vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems Vol. 24, No. 3, 2935–2946, 2023.
Wu, C.; Lin, Z.; Cohen, S.; Bui, T.; Maji, S. PhraseCut: Language-based image segmentation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10213–10222, 2020.
Yang, Z.; Wang, J.; Tang, Y.; Chen, K.; Zhao, H.; Torr, P. H. S. LAVT: Language-aware vision transformer for referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18134–18144, 2022.
Xie, J.; Hou, X.; Ye, K.; Shen, L. CLIMS: Cross language image matching for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4473–4482, 2022.
Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; Huang, T. SegGPT: Segmenting everything in context. arXiv preprint arXiv:2304.03284, 2023.
Peng, J.; Jiang, G.; Chen, D.; Zhao, T.; Wang, H.; Fu, X. Eliminating cross-camera bias for vehicle reidentification. Multimedia Tools and Applications Vol. 81, No. 24, 34195–34211, 2022.
Khorramshahi, P.; Peri, N.; Chen, J. C.; Chellappa, R. The devil is in the details: Self-supervised attention for vehicle re-identification. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12359. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 369–386, 2020.
Zhu, X.; Luo, Z.; Fu, P.; Ji, X. VOC-RelD: Vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2566–2573, 2020.
Ester, M.; Kriegel, H. P.; Sander, J.; Xu, X. Adensity-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 226–231, 1996.
Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, 8748–8763, 2021.
He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. TransReID: Transformer-based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14993–15002, 2021.
Li, J.; Wang, M.; Gong, X. Transformer based multi-grained features for unsupervised person reidentification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 1–9, 2023.
Li, M.; Li, C. G.; Guo, J. Cluster-guided asymmetric contrastive learning for unsupervised person reidentification. IEEE Transactions on Image Processing Vol. 31, 3606–3617, 2022.
Han, X.; Yu, X.; Li, G.; Zhao, J.; Pan, G.; Ye, Q.; Jiao, J.; Han, Z. Rethinking sampling strategies for unsupervised person re-identification. IEEE Transactions on Image Processing Vol. 32, 29–42, 2023.
Hu, Z.; Zhu, C.; He, G. Hard-sample guided hybrid contrast learning for unsupervised person re-identification. In: Proceedings of the 7th IEEE International Conference on Network Intelligence and Digital Content, 91–95, 2021.
Zhang, X.; Li, D.; Wang, Z.; Wang, J.; Ding, E.; Shi, J. Q.; Zhang, Z.; Wang, J. Implicit sample extension for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7359–7368, 2022.
Yang, F.; Zhong, Z.; Luo, Z.; Cai, Y.; Lin, Y.; Li, S.; Sebe, N. Joint noise-tolerant learning and meta camera shift adaptation for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4853–4862, 2021.
Ge, Y.; Zhu, F.; Chen, D.; Zhao, R.; Li, H. Self-paced contrastive learning with hybrid memory for domain adaptive object re-ID. arXiv preprint arXiv:2006.02713,2020.
Ge, Y.; Chen, D.; Li, H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv preprint arXiv:2001.01526, 2020.
Liu, X.; Zhang, S. Graph consistency based mean-teaching for unsupervised domain adaptive person re-identification. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, 874–880, 2021.
Ding, Y.; Fan, H.; Xu, M.; Yang, Y. Adaptive exploration for unsupervised person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 16, No. 1, Article No. 3, 2020.
Wang, W.; Zhao, F.; Liao, S.; Shao, L. Attentive WaveBlock: Complementarity-enhanced mutual networks for unsupervised domain adaptation in person re-identification and beyond. IEEE Transactions on Image Processing Vol. 31, 1532–1544, 2022.
Zheng, K.; Liu, W.; He, L.; Mei, T.; Luo, J.; Zha, Z. J. Group-aware label transfer for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5306–5315, 2021.
Wang, D.; Zhang, S. Unsupervised person reidentification via multi-label classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10978–10987, 2020.
Lin, Y.; Dong, X.; Zheng, L.; Yan, Y.; Yang, Y. A bottom-up clustering approach to unsupervised person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 1, 8738–8745, 2019.
Zhong, Z.; Zheng, L.; Luo, Z.; Li, S.; Yang, Y. Invariance matters: Exemplar memory for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 598–607, 2019.
Yu, H. X.; Zheng, W. S. Weakly supervised discriminative feature learning with state information for person identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5527–5537, 2020.
Lin, Y.; Xie, L.; Wu, Y.; Yan, C.; Tian, Q. Unsupervised person re-identification via softened similarity learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3387–3396, 2020.
Li, J.; Zhang, S. Joint visual and temporal consistency for unsupervised domain adaptive person re-identification. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12369. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 483–499, 2020.
Zeng, K.; Ning, M.; Wang, Y.; Guo, Y. Hierarchical clustering with hard-batch triplet loss for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13654–13662, 2020.
Jin, X.; Lan, C.; Zeng, W.; Chen, Z. Uncertainty-aware multi-shot knowledge distillation for image-based object re-identification. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 11165–11172, 2020.
Zhouy, Y.; Shao, L. Viewpoint-aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6489–6498, 2018.
Rao, Y.; Chen, G.; Lu, J.; Zhou, J. Counterfactual attention learning for fine-grained visual categorization and re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1005–1014, 2021.
Jin, X.; Lan, C.; Zeng, W.; Wei, G.; Chen, Z. Semantics-aligned representation learning for person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 11173–11180, 2020.
Yan, C.; Pang, G.; Bai, X.; Liu, C.; Ning, X.; Gu, L.; Zhou, J. Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia Vol. 24, 1665–1677, 2021.
Bai, Y.; Lou, Y.; Dai, Y.; Liu, J.; Chen, Z.; Duan, L.-Y.; Pillar, I. Disentangled feature learning network for vehicle re-identification. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, 474–480, 2020.
Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. Y. Embedding adversarial learning for vehicle re-identification. IEEE Transactions on Image Processing Vol. 28, No. 8, 3794–3807, 2019.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant Nos. 62076117 and 62166026, and the Jiangxi Provincial Natural Science Foundation under Grant Nos. 20224BAB212011, 20232BAB212008, and 20232BAB202051.
Author information
Authors and Affiliations
Contributions
Dong Wang: Methodology, Writing-Original Draft, Conceptualization, Implementation. Qi Wang: Funding Acquisition, Project Administration, Writing-Review and Editing. Weidong Min: Funding Acquisition, Project Administration, Supervision. Di Gai: Formal Analysis, Visualization. Qing Han: Visualization, Data Curation. Longfei Li: Validation, Software. Yuhan Geng: Validation, Investigation.
Corresponding author
Ethics declarations
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Dong Wang received his B.E. degree in software engineering from Nanchang University, China, in 2022. He is currently pursuing an M.E. degree at Nanchang University. His current research interests include computer vision.
Qi Wang received his M.E. and Ph.D. degrees from the School of Information Engineering, Nanchang University, China, in 2018 and 2021, respectively. He is currently a lecturer in the School of Software, Nanchang University. He also is an assistant researcher in Jiangxi Key Laboratory of Smart City, China. His current research focuses on computer vision and deep learning, particularly object re-identification.
Weidong Min received his B.E., M.E., and Ph.D. degrees in computer applications from Tsinghua University, China, in 1989, 1991, and 1995, respectively. He is currently a professor in the School of Mathematics and Computer Science, and dean of the Institute of Metaverse, Nanchang University. He is the dean of Jiangxi Key Laboratory of Smart City, China. He is an executive director of China Society of Image and Graphics. His current research interests include image and video processing, virtual reality, artificial intelligence, big data, and distributed systems.
Di Gai received his M.E. and Ph.D. degrees from the College of Computer Science and Technology, Jilin University, China, in 2018 and 2021, respectively. He is currently a lecturer in the School of Software, Nanchang University. He also is an assistant researcher in Jiangxi Key Laboratory of Smart City, China. His research interests include medical image processing and pattern recognition, especially image fusion.
Qing Han obtained her B.E. and M.E. degrees in computer applications from Tianjin Polytechnic University, China, in 1997 and 2006, respectively. She is now an associate professor in the School of Mathematics and Computer Science, Nanchang University. Her research interests include image and video processing, and network management.
Longfei Li received his B.E. degree in software engineering from Jiangxi Normal University, China, in 2021. He is currently pursuing an M.E. degree at Nanchang University, China. His current research interests include computer vision.
Yuhan Geng received her B.S. degree in bioinformatics from The Chinese University of Hong Kong, Shenzhen, China, in 2023. She is currently pursuing an M.S. degree in the University of Michigan, Ann Arbor, USA. Her current research interests include computer vision.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Wang, D., Wang, Q., Min, W. et al. SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification. Comp. Visual Media (2024). https://doi.org/10.1007/s41095-024-0424-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41095-024-0424-2