Recently, person re-identification technique has been successfully applied to many fields, such as suspect tracking and lost human location. As video always contains more valuable information, more and more researchers focus on video based person re-identification, especially in image-to-video person re-identification (IVPR). However, most of existing IVPR models are under the supervised framework. In fact, marking enough training samples will cost numbers of labors, which limits the practical value of them. At the same time, the 2D features extracted from pedestrian image and 3D features extracted from pedestrian video are heterogeneous, which brings significant challenge for IVPR task. To effective solve the above problems, we propose an unsupervised domain adaption image-to-video person re-identification model by cross-modal feature generating and target information preserving transfer network (CMGTN). On one hand, the designed generator in our model can not only transform target domain unlabeled sample features into source domain feature space, but also can preserve target identity information. On the other hand, we eliminate the gap between pedestrian images and videos by embedding a cross-modal loss term. To evaluate the performance of our approach, we conduct extensive experiments on PRID-2011, iLIDS-VID and MARS datasets, and compare our approach with existing state-of-the-art IVPR models including four unsupervised methods and three supervised methods. Experimental results demonstrate the effectiveness of our approach.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Bak S, Corvee E, Bremond F, Thonnat M (2010) Person re-identification using haar-based and DCD-based signature. In: IEEE conference on advanced video and signal-based surveillance, pp 1–8
Baltieri D, Vezzani R, Cucchiara R (2013) Learning articulated body models for people re-identification. In: ACM international conference on multimedia, pp 557–560
Chen Y, Zhu X, Gong S (2018) Deep association learning for unsupervised video person re-identification. British machine vision conference, p 48
Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377
Fan H, Zheng L, Yang Y (2018) Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans Multimed Computing Commun Appl 14(4):8:1-8:18
Gong Y, Ke Q, Isard M, Lazebnik SA (2014) Multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial networks. Adv Neural Inf Processing Sys 3:2672–2680
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis, pp 91–102
Huang W, Liang C, Yu Y, Wang Z, Ruan W, Hu R (2018) Video-based person re-identification via self paced weighting. In: AAAI conference on artificial intelligence (AAAI), pp 2273–2280
Jing X, Zhang X, Zhu X, Wu F, You X, Gao Y, Shan S, Yang J (2019) Multiset feature learning for highly imbalanced data classification. IEEE transactions on pattern analysis and machine intelligence, https://doi.org/10.1109/TPAMI.2019.2929166
Kodirov E, Xiang T, Fu Z-Y, et al (2016) Person re-identification by unsupervised graph learning. In: European conference on computer vision, pp 178–195
Li X, Yin H, Zhou K, Zhou X (2019) Semi-supervised clustering with deep metric learning and graph embedding. World Wide Web, https://doi.org/10.1007/s11280-019-00723-8
Liao S, Zhu X, Li S (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition, pp 2197–2206
Liong V, Lu J, Tan Y-P, Zhou J (2017) Deep coupled metric learning for cross-modal matching. IEEE Trans Multimed 19(6):1234–1244
Lisanti G, Martinel N, Micheloni C, Bimbo AD, Foresti GL (2019) From person to group re-identification via unsupervised transfer of sparse features. Image Vis Comput 83-84:29–38
Lisanti G, Masi I, Bagdanov DA, Bimbo A (2015) Person re-identification by iterative re-weighted sparse ranking. IEEE Trans Pattern Anal Mach Intell 37(8):1629–1642
Liu K, Ma B, Zhang W, Huang RA (2015) Spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: IEEE international conference on computer vision, pp 3810–3818
Liu Z, Wang D, Lu H (2017) Stepwise metric promotion for unsupervised video person re-identification. In: IEEE international conference on computer vision, pp 2448–2457
Lv J, Weihang Chen W, Qing Li Q, Can Yang C (2018) Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns, pp 7948–7956
Peng P, Xiang T, Wang Y, et al (2016) Unsupervised cross-dataset transfer learning for person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1306–1315
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition, pp 815–823
Taigman Y, Polyak A, Wolf L (2016) Unsupervised cross-domain image generation, CoRR
Tian J, Teng Z, Li R, Li Y, Zhang B, Fan J (2019) Imitating targets from all sides: an unsupervised transfer learning method for person re-identification. arXiv:1904.05020
Wang C, Yang H, Meinel C (2016) A deep semantic framework for multimodal representation learning. Multimed Tools Appl 75(15):9255–9276
Wang G, Lai J, Xie X (2018) P2SNet: Can an image match a video for person re-identification in an end-to-end way? IEEE Trans Circ Sys Vid Technol 28 (10):2777–2787
Wang J, Zhu X, Gong S, Li W (2018) Transferable joint Attribute-Identity deep learning for unsupervised person re-identification. In: CVPR, pp 2275–2284
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: ECCV, pp 688–703
Wu J, Liao S, Lei Z, Wang X, Yang Y, Li S (2019) Clustering and dynamic sampling based unsupervised domain adaptation for person re-identification. In: IEEE international conference on multimedia and expo, pp 886–891
Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 1249–1258
Yan F, Mikolajczyk K. (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition, pp 3441–3450
Ye M, Ma A, Zheng L, et al (2017) Dynamic label graph matching for unsupervised video re-identification. In: IEEE international conference on computer vision, pp 5152–5160
Yu B, Xu N (2019) Urgent image-to-video person reidentification by cross-media transfer cycle generative adversarial networks. J Electronic Imaging 28(1):013052
Yu H, Wu A, Zheng W (2017) Cross-view asymmetric metric learning for unsupervised person re-identification, in IEEE international conference on computer vision, pp 994–1002
Zhang D, Wu W, Cheng H, et al (2018) Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circ Syst Vid Technol 28(10):2622–2632
Zhang Y, Li S (2011) Gabor-LBP based region covariance descriptor for person re-identification. In: IEEE conference on image and graphics, pp 368–371
Zheng L, Bie Z, Sun Y, et al (2016) MARS: a video benchmark for large-scale person re-identification. In: European conference on computer vision, pp 868–884
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person Re-identification: a benchmark. In: IEEE conference onon computer vision, pp 1116–1124
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3346–3355
Zheng W, Gong S, Xiang T (2009) Associating groups of people. In: British machine vision conference, pp 2–6
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: IEEE international conference on computer vision, pp 3774–3782
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE conference computer vision and pattern recognition, pp 6776–6785
Zhu X, Jing X -Y, Wu F, et al (2016) Distance learning by treating negative samples differently and exploiting impostors with symmetric triplet constraint for person re-identification. In: IEEE international conference on multimedia and expo, pp 1–6
Zhu X, Jing X -Y, You X, et al (2018) Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans Info Foren Sec 13(3):717–732
The authors would like to thank the editor, the associate editor, and anonymous reviewers for their constructive comments in helping improve our work. This work was supported by the NSFC-Key Project under Grant No. 61933013, the NSFC-Key Project of General Technology Fundamental Research United Fund under Grant No. U1736211, the Key Project of Natural Science Foundation of Hubei Province under Grant No. 2018CFA024, the Natural Science Foundation of Guangdong Province under Grant No. 2019A1515011076.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhang, X., Li, S., Jing, XY. et al. Unsupervised domain adaption for image-to-video person re-identification. Multimed Tools Appl 79, 33793–33810 (2020). https://doi.org/10.1007/s11042-019-08550-9
- Unsupervised domain adaption
- Person re-identification
- Deep learning