SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification

Wang, Dong; Wang, Qi; Min, Weidong; Gai, Di; Han, Qing; Li, Longfei; Geng, Yuhan

doi:10.1007/s41095-024-0424-2

SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification

Research Article
Open access
Published: 15 August 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification

Download PDF

Dong Wang¹,
Qi Wang^2,3,4,
Weidong Min^2,3,4,
Di Gai^2,3,4,
Qing Han^2,3,4,
Longfei Li² &
…
Yuhan Geng⁵

Abstract

Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification (Re-ID). Re-ID models suffer from varying degrees of background interference caused by continuous scene variations. The recently proposed segment anything model (SAM) has demonstrated exceptional performance in zero-shot segmentation tasks. The combination of SAM and vehicle Re-ID models can achieve efficient separation of vehicle identity and background information. This paper proposes a method that combines SAM-driven mask autoencoder (MAE) pre-training and background-aware meta-learning for unsupervised vehicle Re-ID. The method consists of three sub-modules. First, the segmentation capacity of SAM is utilized to separate the vehicle identity region from the background. SAM cannot be robustly employed in exceptional situations, such as those with ambiguity or occlusion. Thus, in vehicle Re-ID downstream tasks, a spatially-constrained vehicle background segmentation method is presented to obtain accurate background segmentation results. Second, SAM-driven MAE pre-training utilizes the aforementioned segmentation results to select patches belonging to the vehicle and to mask other patches, allowing MAE to learn identity-sensitive features in a self-supervised manner. Finally, we present a background-aware meta-learning method to fit varying degrees of background interference in different scenarios by combining different background region ratios. Our experiments demonstrate that the proposed method has state-of-the-art performance in reducing background interference variations.

Article PDF

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Availability of data and materials

The data presented in this study are available on request from the corresponding author.

References

Lei, J.; Qin, T.; Peng, B.; Li, W.; Pan, Z.; Shen, H.; Kwong, S. Reducing background induced domain shift for adaptive person re-identification. IEEE Transactions on Industrial Informatics Vol. 19, No. 6, 7377–7388, 2023.
Article Google Scholar
Zhang, G.; Zhang, H.; Lin, W.; Chandran, A. K.; Jing, X. Camera contrast learning for unsupervised person re-identification. IEEE Transactions on Circuits and Systems for Video Technology Vol. 33, No. 8, 4096–4107, 2023.
Article Google Scholar
Zhu, K.; Guo, H.; Liu, S.; Wang, J.; Tang, M. Learning semantics-consistent stripes with self-refinement for person re-identification. IEEE Transactions on Neural Networks and Learning Systems Vol. 34, No. 11, 8531–8542, 2023.
Article Google Scholar
Wu, M.; Zhang, Y.; Zhang, T.; Zhang, W. Background segmentation for vehicle re-identification. In: MultiMedia Modeling. Lecture Notes in Computer Science, Vol. 11962. Springer Cham, 88–99, 2020.
Chapter Google Scholar
Munir, A.; Martinel, N.; Micheloni, C. Oriented splits network to distill background for vehicle reidentification. In: Proceedings of the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance, 1–8, 2021.
Google Scholar
Lu, Z.; Lin, R.; Hu, H. MART: Mask-aware reasoning transformer for vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems Vol. 24, No. 2, 1994–2009, 2023.
Google Scholar
Ning, X.; Gong, K.; Li, W.; Zhang, L.; Bai, X.; Tian, S. Feature refinement and filter network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology Vol. 31, No. 9, 3391–3402, 2021.
Article Google Scholar
Li, Z.; Deng, Y.; Tang, Z.; Huang, J. SFMNet: Self-guided feature mining network for vehicle reidentification. In: Proceedings of the International Joint Conference on Neural Networks, 1–8, 2023.
Google Scholar
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollar, P.; Girshick, R. Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15979–15988, 2022.
Google Scholar
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A. C.; Lo, W. Y.; et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
Lin, Y.; Wu, Y.; Yan, C.; Xu, M.; Yang, Y. Unsupervised person re-identification via cross-camera similarity exploration. IEEE Transactions on Image Processing Vol. 29, 5481–5490, 2020.
Article Google Scholar
Wang, H.; Lu, J.; Pang, F.; Zhou, J.; Zhang, K. Bi-directional style adaptation network for person re-identification. IEEE Sensors Journal Vol. 22, No. 12, 12339–12347, 2022.
Article Google Scholar
Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. VERI-wild: A large dataset and a new method for vehicle re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3230–3238, 2019.
Google Scholar
Kamenou, E.; Del Rincon, J. M.; Miller, P.; Devlin-Hill, P. A meta-learning approach for domain generalisation across visual modalities in vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 385–393, 2023.
Google Scholar
Zhang, L.; Liu, Z.; Zhang, W.; Zhang, D. Style uncertainty based self-paced meta learning for generalizable person re-identification. IEEE Transactions on Image Processing Vol. 32, 2107–2119, 2023.
Article Google Scholar
Ni, H.; Song, J.; Luo, X.; Zheng, F.; Li, W.; Shen, H. T. Meta distribution alignment for generalizable person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2477–2486, 2022.
Google Scholar
Zheng, Z.; Ruan, T.; Wei, Y.; Yang, Y.; Mei, T. VehicleNet: Learning robust visual representation for vehicle re-identification. IEEE Transactions on Multimedia Vol. 23, 2683–2693, 2020.
Article Google Scholar
Yu, J.; Oh, H. Unsupervised vehicle re-identification via self-supervised metric learning using feature dictionary. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 3806–3813, 2021.
Google Scholar
Lu, Z.; Lin, R.; He, Q.; Hu, H. Mask-aware pseudo label denoising for unsupervised vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems Vol. 24, No. 4, 4333–4347, 2023.
Article Google Scholar
He, Z.; Zhao, H.; Wang, J.; Feng, W. Multilevel progressive learning for unsupervised vehicle re-identification. IEEE Transactions on Vehicular Technology Vol. 72, No. 4, 4357–4371, 2023.
Article Google Scholar
Wang, P.; Ding, C.; Tan, W.; Gong, M.; Jia, K.; Tao, D. Uncertainty-aware clustering for unsupervised domain adaptive object re-identification. IEEE Transactions on Multimedia Vol. 25, 2624–2635, 2022.
Article Google Scholar
Dai, P.; Chen, P.; Wu, Q.; Hong, X.; Ye, Q.; Tian, Q.; Lin, C. W.; Ji, R. Disentangling task-oriented representations for unsupervised domain adaptation. IEEE Transactions on Image Processing Vol. 31, 1012–1026, 2022.
Article Google Scholar
Wei, R.; Gu, J.; He, S.; Jiang, W. Transformer-based domain-specific representation for unsupervised domain adaptive vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems Vol. 24, No. 3, 2935–2946, 2023.
Article Google Scholar
Wu, C.; Lin, Z.; Cohen, S.; Bui, T.; Maji, S. PhraseCut: Language-based image segmentation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10213–10222, 2020.
Google Scholar
Yang, Z.; Wang, J.; Tang, Y.; Chen, K.; Zhao, H.; Torr, P. H. S. LAVT: Language-aware vision transformer for referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18134–18144, 2022.
Google Scholar
Xie, J.; Hou, X.; Ye, K.; Shen, L. CLIMS: Cross language image matching for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4473–4482, 2022.
Google Scholar
Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; Huang, T. SegGPT: Segmenting everything in context. arXiv preprint arXiv:2304.03284, 2023.
Peng, J.; Jiang, G.; Chen, D.; Zhao, T.; Wang, H.; Fu, X. Eliminating cross-camera bias for vehicle reidentification. Multimedia Tools and Applications Vol. 81, No. 24, 34195–34211, 2022.
Article Google Scholar
Khorramshahi, P.; Peri, N.; Chen, J. C.; Chellappa, R. The devil is in the details: Self-supervised attention for vehicle re-identification. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12359. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 369–386, 2020.
Chapter Google Scholar
Zhu, X.; Luo, Z.; Fu, P.; Ji, X. VOC-RelD: Vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2566–2573, 2020.
Google Scholar
Ester, M.; Kriegel, H. P.; Sander, J.; Xu, X. Adensity-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 226–231, 1996.
Google Scholar
Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, 8748–8763, 2021.
Google Scholar
He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. TransReID: Transformer-based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 14993–15002, 2021.
Google Scholar
Li, J.; Wang, M.; Gong, X. Transformer based multi-grained features for unsupervised person reidentification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 1–9, 2023.
Google Scholar
Li, M.; Li, C. G.; Guo, J. Cluster-guided asymmetric contrastive learning for unsupervised person reidentification. IEEE Transactions on Image Processing Vol. 31, 3606–3617, 2022.
Article Google Scholar
Han, X.; Yu, X.; Li, G.; Zhao, J.; Pan, G.; Ye, Q.; Jiao, J.; Han, Z. Rethinking sampling strategies for unsupervised person re-identification. IEEE Transactions on Image Processing Vol. 32, 29–42, 2023.
Article Google Scholar
Hu, Z.; Zhu, C.; He, G. Hard-sample guided hybrid contrast learning for unsupervised person re-identification. In: Proceedings of the 7th IEEE International Conference on Network Intelligence and Digital Content, 91–95, 2021.
Google Scholar
Zhang, X.; Li, D.; Wang, Z.; Wang, J.; Ding, E.; Shi, J. Q.; Zhang, Z.; Wang, J. Implicit sample extension for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7359–7368, 2022.
Google Scholar
Yang, F.; Zhong, Z.; Luo, Z.; Cai, Y.; Lin, Y.; Li, S.; Sebe, N. Joint noise-tolerant learning and meta camera shift adaptation for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4853–4862, 2021.
Google Scholar
Ge, Y.; Zhu, F.; Chen, D.; Zhao, R.; Li, H. Self-paced contrastive learning with hybrid memory for domain adaptive object re-ID. arXiv preprint arXiv:2006.02713,2020.
Ge, Y.; Chen, D.; Li, H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv preprint arXiv:2001.01526, 2020.
Liu, X.; Zhang, S. Graph consistency based mean-teaching for unsupervised domain adaptive person re-identification. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, 874–880, 2021.
Google Scholar
Ding, Y.; Fan, H.; Xu, M.; Yang, Y. Adaptive exploration for unsupervised person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 16, No. 1, Article No. 3, 2020.
Wang, W.; Zhao, F.; Liao, S.; Shao, L. Attentive WaveBlock: Complementarity-enhanced mutual networks for unsupervised domain adaptation in person re-identification and beyond. IEEE Transactions on Image Processing Vol. 31, 1532–1544, 2022.
Article Google Scholar
Zheng, K.; Liu, W.; He, L.; Mei, T.; Luo, J.; Zha, Z. J. Group-aware label transfer for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5306–5315, 2021.
Google Scholar
Wang, D.; Zhang, S. Unsupervised person reidentification via multi-label classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10978–10987, 2020.
Google Scholar
Lin, Y.; Dong, X.; Zheng, L.; Yan, Y.; Yang, Y. A bottom-up clustering approach to unsupervised person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 1, 8738–8745, 2019.
Article Google Scholar
Zhong, Z.; Zheng, L.; Luo, Z.; Li, S.; Yang, Y. Invariance matters: Exemplar memory for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 598–607, 2019.
Google Scholar
Yu, H. X.; Zheng, W. S. Weakly supervised discriminative feature learning with state information for person identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5527–5537, 2020.
Google Scholar
Lin, Y.; Xie, L.; Wu, Y.; Yan, C.; Tian, Q. Unsupervised person re-identification via softened similarity learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3387–3396, 2020.
Google Scholar
Li, J.; Zhang, S. Joint visual and temporal consistency for unsupervised domain adaptive person re-identification. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12369. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 483–499, 2020.
Chapter Google Scholar
Zeng, K.; Ning, M.; Wang, Y.; Guo, Y. Hierarchical clustering with hard-batch triplet loss for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13654–13662, 2020.
Google Scholar
Jin, X.; Lan, C.; Zeng, W.; Chen, Z. Uncertainty-aware multi-shot knowledge distillation for image-based object re-identification. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 11165–11172, 2020.
Article Google Scholar
Zhouy, Y.; Shao, L. Viewpoint-aware attentive multi-view inference for vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6489–6498, 2018.
Google Scholar
Rao, Y.; Chen, G.; Lu, J.; Zhou, J. Counterfactual attention learning for fine-grained visual categorization and re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1005–1014, 2021.
Google Scholar
Jin, X.; Lan, C.; Zeng, W.; Wei, G.; Chen, Z. Semantics-aligned representation learning for person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 11173–11180, 2020.
Article Google Scholar
Yan, C.; Pang, G.; Bai, X.; Liu, C.; Ning, X.; Gu, L.; Zhou, J. Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia Vol. 24, 1665–1677, 2021.
Article Google Scholar
Bai, Y.; Lou, Y.; Dai, Y.; Liu, J.; Chen, Z.; Duan, L.-Y.; Pillar, I. Disentangled feature learning network for vehicle re-identification. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, 474–480, 2020.
Google Scholar
Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. Y. Embedding adversarial learning for vehicle re-identification. IEEE Transactions on Image Processing Vol. 28, No. 8, 3794–3807, 2019.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 62076117 and 62166026, and the Jiangxi Provincial Natural Science Foundation under Grant Nos. 20224BAB212011, 20232BAB212008, and 20232BAB202051.

Author information

Authors and Affiliations

School of Software, Nanchang University, Nanchang, 330047, China
Dong Wang
School of Mathematics and Computer Science, Nanchang University, Nanchang, 330031, China
Qi Wang, Weidong Min, Di Gai, Qing Han & Longfei Li
Institute of Metaverse, Nanchang University, Nanchang, 330031, China
Qi Wang, Weidong Min, Di Gai & Qing Han
Jiangxi Key Laboratory of Smart City, Nanchang, 330031, China
Qi Wang, Weidong Min, Di Gai & Qing Han
School of Public Health, University of Michigan, Ann Arbor, 48109, USA
Yuhan Geng

Authors

Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Min
View author publications
You can also search for this author in PubMed Google Scholar
Di Gai
View author publications
You can also search for this author in PubMed Google Scholar
Qing Han
View author publications
You can also search for this author in PubMed Google Scholar
Longfei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuhan Geng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Dong Wang: Methodology, Writing-Original Draft, Conceptualization, Implementation. Qi Wang: Funding Acquisition, Project Administration, Writing-Review and Editing. Weidong Min: Funding Acquisition, Project Administration, Supervision. Di Gai: Formal Analysis, Visualization. Qing Han: Visualization, Data Curation. Longfei Li: Validation, Software. Yuhan Geng: Validation, Investigation.

Corresponding author

Correspondence to Weidong Min.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Dong Wang received his B.E. degree in software engineering from Nanchang University, China, in 2022. He is currently pursuing an M.E. degree at Nanchang University. His current research interests include computer vision.

Qi Wang received his M.E. and Ph.D. degrees from the School of Information Engineering, Nanchang University, China, in 2018 and 2021, respectively. He is currently a lecturer in the School of Software, Nanchang University. He also is an assistant researcher in Jiangxi Key Laboratory of Smart City, China. His current research focuses on computer vision and deep learning, particularly object re-identification.

Weidong Min received his B.E., M.E., and Ph.D. degrees in computer applications from Tsinghua University, China, in 1989, 1991, and 1995, respectively. He is currently a professor in the School of Mathematics and Computer Science, and dean of the Institute of Metaverse, Nanchang University. He is the dean of Jiangxi Key Laboratory of Smart City, China. He is an executive director of China Society of Image and Graphics. His current research interests include image and video processing, virtual reality, artificial intelligence, big data, and distributed systems.

Di Gai received his M.E. and Ph.D. degrees from the College of Computer Science and Technology, Jilin University, China, in 2018 and 2021, respectively. He is currently a lecturer in the School of Software, Nanchang University. He also is an assistant researcher in Jiangxi Key Laboratory of Smart City, China. His research interests include medical image processing and pattern recognition, especially image fusion.

Qing Han obtained her B.E. and M.E. degrees in computer applications from Tianjin Polytechnic University, China, in 1997 and 2006, respectively. She is now an associate professor in the School of Mathematics and Computer Science, Nanchang University. Her research interests include image and video processing, and network management.

Longfei Li received his B.E. degree in software engineering from Jiangxi Normal University, China, in 2021. He is currently pursuing an M.E. degree at Nanchang University, China. His current research interests include computer vision.

Yuhan Geng received her B.S. degree in bioinformatics from The Chinese University of Hong Kong, Shenzhen, China, in 2023. She is currently pursuing an M.S. degree in the University of Michigan, Ann Arbor, USA. Her current research interests include computer vision.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Wang, D., Wang, Q., Min, W. et al. SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification. Comp. Visual Media (2024). https://doi.org/10.1007/s41095-024-0424-2

Download citation

Received: 04 January 2024
Accepted: 03 March 2024
Published: 15 August 2024
DOI: https://doi.org/10.1007/s41095-024-0424-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification

Abstract

Article PDF

Explore related subjects

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation