Unsupervised domain adaption for image-to-video person re-identification


Recently, person re-identification technique has been successfully applied to many fields, such as suspect tracking and lost human location. As video always contains more valuable information, more and more researchers focus on video based person re-identification, especially in image-to-video person re-identification (IVPR). However, most of existing IVPR models are under the supervised framework. In fact, marking enough training samples will cost numbers of labors, which limits the practical value of them. At the same time, the 2D features extracted from pedestrian image and 3D features extracted from pedestrian video are heterogeneous, which brings significant challenge for IVPR task. To effective solve the above problems, we propose an unsupervised domain adaption image-to-video person re-identification model by cross-modal feature generating and target information preserving transfer network (CMGTN). On one hand, the designed generator in our model can not only transform target domain unlabeled sample features into source domain feature space, but also can preserve target identity information. On the other hand, we eliminate the gap between pedestrian images and videos by embedding a cross-modal loss term. To evaluate the performance of our approach, we conduct extensive experiments on PRID-2011, iLIDS-VID and MARS datasets, and compare our approach with existing state-of-the-art IVPR models including four unsupervised methods and three supervised methods. Experimental results demonstrate the effectiveness of our approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Bak S, Corvee E, Bremond F, Thonnat M (2010) Person re-identification using haar-based and DCD-based signature. In: IEEE conference on advanced video and signal-based surveillance, pp 1–8

  2. 2.

    Baltieri D, Vezzani R, Cucchiara R (2013) Learning articulated body models for people re-identification. In: ACM international conference on multimedia, pp 557–560

  3. 3.

    Chen Y, Zhu X, Gong S (2018) Deep association learning for unsupervised video person re-identification. British machine vision conference, p 48

  4. 4.

    Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377

    MathSciNet  Article  Google Scholar 

  5. 5.

    Fan H, Zheng L, Yang Y (2018) Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans Multimed Computing Commun Appl 14(4):8:1-8:18

    Article  Google Scholar 

  6. 6.

    Gong Y, Ke Q, Isard M, Lazebnik SA (2014) Multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233

    Article  Google Scholar 

  7. 7.

    Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial networks. Adv Neural Inf Processing Sys 3:2672–2680

    Google Scholar 

  8. 8.

    Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis, pp 91–102

  9. 9.

    Huang W, Liang C, Yu Y, Wang Z, Ruan W, Hu R (2018) Video-based person re-identification via self paced weighting. In: AAAI conference on artificial intelligence (AAAI), pp 2273–2280

  10. 10.

    Jing X, Zhang X, Zhu X, Wu F, You X, Gao Y, Shan S, Yang J (2019) Multiset feature learning for highly imbalanced data classification. IEEE transactions on pattern analysis and machine intelligence, https://doi.org/10.1109/TPAMI.2019.2929166

  11. 11.

    Kodirov E, Xiang T, Fu Z-Y, et al (2016) Person re-identification by unsupervised graph learning. In: European conference on computer vision, pp 178–195

  12. 12.

    Li X, Yin H, Zhou K, Zhou X (2019) Semi-supervised clustering with deep metric learning and graph embedding. World Wide Web, https://doi.org/10.1007/s11280-019-00723-8

  13. 13.

    Liao S, Zhu X, Li S (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition, pp 2197–2206

  14. 14.

    Liong V, Lu J, Tan Y-P, Zhou J (2017) Deep coupled metric learning for cross-modal matching. IEEE Trans Multimed 19(6):1234–1244

    Article  Google Scholar 

  15. 15.

    Lisanti G, Martinel N, Micheloni C, Bimbo AD, Foresti GL (2019) From person to group re-identification via unsupervised transfer of sparse features. Image Vis Comput 83-84:29–38

    Article  Google Scholar 

  16. 16.

    Lisanti G, Masi I, Bagdanov DA, Bimbo A (2015) Person re-identification by iterative re-weighted sparse ranking. IEEE Trans Pattern Anal Mach Intell 37(8):1629–1642

    Article  Google Scholar 

  17. 17.

    Liu K, Ma B, Zhang W, Huang RA (2015) Spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: IEEE international conference on computer vision, pp 3810–3818

  18. 18.

    Liu Z, Wang D, Lu H (2017) Stepwise metric promotion for unsupervised video person re-identification. In: IEEE international conference on computer vision, pp 2448–2457

  19. 19.

    Lv J, Weihang Chen W, Qing Li Q, Can Yang C (2018) Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns, pp 7948–7956

  20. 20.

    Peng P, Xiang T, Wang Y, et al (2016) Unsupervised cross-dataset transfer learning for person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1306–1315

  21. 21.

    Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: IEEE conference on computer vision and pattern recognition, pp 815–823

  22. 22.

    Taigman Y, Polyak A, Wolf L (2016) Unsupervised cross-domain image generation, CoRR

  23. 23.

    Tian J, Teng Z, Li R, Li Y, Zhang B, Fan J (2019) Imitating targets from all sides: an unsupervised transfer learning method for person re-identification. arXiv:1904.05020

  24. 24.

    Wang C, Yang H, Meinel C (2016) A deep semantic framework for multimodal representation learning. Multimed Tools Appl 75(15):9255–9276

    Article  Google Scholar 

  25. 25.

    Wang G, Lai J, Xie X (2018) P2SNet: Can an image match a video for person re-identification in an end-to-end way? IEEE Trans Circ Sys Vid Technol 28 (10):2777–2787

    Article  Google Scholar 

  26. 26.

    Wang J, Zhu X, Gong S, Li W (2018) Transferable joint Attribute-Identity deep learning for unsupervised person re-identification. In: CVPR, pp 2275–2284

  27. 27.

    Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: ECCV, pp 688–703

  28. 28.

    Wu J, Liao S, Lei Z, Wang X, Yang Y, Li S (2019) Clustering and dynamic sampling based unsupervised domain adaptation for person re-identification. In: IEEE international conference on multimedia and expo, pp 886–891

  29. 29.

    Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 1249–1258

  30. 30.

    Yan F, Mikolajczyk K. (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition, pp 3441–3450

  31. 31.

    Ye M, Ma A, Zheng L, et al (2017) Dynamic label graph matching for unsupervised video re-identification. In: IEEE international conference on computer vision, pp 5152–5160

  32. 32.

    Yu B, Xu N (2019) Urgent image-to-video person reidentification by cross-media transfer cycle generative adversarial networks. J Electronic Imaging 28(1):013052

    Article  Google Scholar 

  33. 33.

    Yu H, Wu A, Zheng W (2017) Cross-view asymmetric metric learning for unsupervised person re-identification, in IEEE international conference on computer vision, pp 994–1002

  34. 34.

    Zhang D, Wu W, Cheng H, et al (2018) Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circ Syst Vid Technol 28(10):2622–2632

    Article  Google Scholar 

  35. 35.

    Zhang Y, Li S (2011) Gabor-LBP based region covariance descriptor for person re-identification. In: IEEE conference on image and graphics, pp 368–371

  36. 36.

    Zheng L, Bie Z, Sun Y, et al (2016) MARS: a video benchmark for large-scale person re-identification. In: European conference on computer vision, pp 868–884

  37. 37.

    Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person Re-identification: a benchmark. In: IEEE conference onon computer vision, pp 1116–1124

  38. 38.

    Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q (2017) Person re-identification in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3346–3355

  39. 39.

    Zheng W, Gong S, Xiang T (2009) Associating groups of people. In: British machine vision conference, pp 2–6

  40. 40.

    Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: IEEE international conference on computer vision, pp 3774–3782

  41. 41.

    Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE conference computer vision and pattern recognition, pp 6776–6785

  42. 42.

    Zhu X, Jing X -Y, Wu F, et al (2016) Distance learning by treating negative samples differently and exploiting impostors with symmetric triplet constraint for person re-identification. In: IEEE international conference on multimedia and expo, pp 1–6

  43. 43.

    Zhu X, Jing X -Y, You X, et al (2018) Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans Info Foren Sec 13(3):717–732

    Article  Google Scholar 

Download references


The authors would like to thank the editor, the associate editor, and anonymous reviewers for their constructive comments in helping improve our work. This work was supported by the NSFC-Key Project under Grant No. 61933013, the NSFC-Key Project of General Technology Fundamental Research United Fund under Grant No. U1736211, the Key Project of Natural Science Foundation of Hubei Province under Grant No. 2018CFA024, the Natural Science Foundation of Guangdong Province under Grant No. 2019A1515011076.

Author information



Corresponding author

Correspondence to Xiao-Yuan Jing.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Li, S., Jing, XY. et al. Unsupervised domain adaption for image-to-video person re-identification. Multimed Tools Appl 79, 33793–33810 (2020). https://doi.org/10.1007/s11042-019-08550-9

Download citation


  • Unsupervised domain adaption
  • Image-to-video
  • Person re-identification
  • GAN
  • Deep learning