MARS: A Video Benchmark for Large-Scale Person Re-Identification

  • Liang Zheng
  • Zhi Bie
  • Yifan Sun
  • Jingdong Wang
  • Chi Su
  • Shengjin WangEmail author
  • Qi Tian
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9910)


This paper considers person re-identification (re-id) in videos. We introduce a new video re-id dataset, named Motion Analysis and Re-identification Set (MARS), a video extension of the Market-1501 dataset. To our knowledge, MARS is the largest video re-id dataset to date. Containing 1,261 IDs and around 20,000 tracklets, it provides rich visual information compared to image-based datasets. Meanwhile, MARS reaches a step closer to practice. The tracklets are automatically generated by the Deformable Part Model (DPM) as pedestrian detector and the GMMCP tracker. A number of false detection/tracking results are also included as distractors which would exist predominantly in practical video databases. Extensive evaluation of the state-of-the-art methods including the space-time descriptors and CNN is presented. We show that CNN in classification mode can be trained from scratch using the consecutive bounding boxes of each identity. The learned CNN embedding outperforms other competing methods considerably and has good generalization ability on other video re-id datasets upon fine-tuning.


Video person re-identification Motion features CNN 



We thank Zhun Zhong and Linghui Li for providing some benchmarking results. This work was supported by Initiative Scientific Research Program of Ministry of Education under Grant No. 20141081253. This work was supported in part to Dr. Qi Tian by ARO grants W911NF-15-1-0290, Faculty Research Gift Awards by NEC Laboratories of America and Blippar, and National Science Foundation of China (NSFC) 61429201.


  1. 1.
    Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: CVPR (2015)Google Scholar
  2. 2.
    Arandjelovic, R., Zisserman, A.: Multiple queries for large scale specific object retrieval. In: BMVC (2012)Google Scholar
  3. 3.
    Baltieri, D., Vezzani, R., Cucchiara, R.: 3dpes: 3d people dataset for surveillance and forensics. In: ACM Workshop on Human Gesture and Behavior Understanding (2011)Google Scholar
  4. 4.
    Bialkowski, A., Denman, S., Sridharan, S., Fookes, C., Lucey, P.: A database for person re-identification in multi-camera surveillance networks. In: DICTA (2012)Google Scholar
  5. 5.
    Chen, D., Yuan, Z., Hua, G., Zheng, N., Wang, J.: Similarity learning on an explicit polynomial kernel feature map for person re-identification. In: CVPR (2015)Google Scholar
  6. 6.
    Das, A., Chakraborty, A., Roy-Chowdhury, A.K.: Consistent re-identification in a camera network. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 330–345. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10605-2_22 Google Scholar
  7. 7.
    Dehghan, A., Assari, S.M., Shah, M.: Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: CVPR (2015)Google Scholar
  8. 8.
    Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)Google Scholar
  9. 9.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)Google Scholar
  10. 10.
    Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features. In: CVPR (2010)Google Scholar
  11. 11.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. Pattern Anal. Mach. Intell. IEEE Trans. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  12. 12.
    Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: Proceedings IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, vol. 3 (2007)Google Scholar
  13. 13.
    Han, J., Bhanu, B.: Individual recognition using gait energy image. Pattern Anal. Mach. Intell. IEEE Trans. 28(2), 316–322 (2006)CrossRefGoogle Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Image Analysis, pp. 91–102 (2011)Google Scholar
  16. 16.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014)Google Scholar
  17. 17.
    Ke, Y., Sukthankar, R., Hebert, M.: Volumetric features for video event detection. Int. J. Comput. Vis. 88(3), 339–362 (2010)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)Google Scholar
  19. 19.
    Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: CVPR, pp. 2288–2295 (2012)Google Scholar
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  21. 21.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)CrossRefGoogle Scholar
  22. 22.
    Li, W., Wang, X.: Locally aligned feature transforms across views. In: CVPR (2013)Google Scholar
  23. 23.
    Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: CVPR, pp. 152–159 (2014)Google Scholar
  24. 24.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015)Google Scholar
  25. 25.
    Liu, K., Ma, B., Zhang, W., Huang, R.: A spatio-temporal appearance representation for video-based pedestrian re-identification. In: CVPR, pp. 3810–3818 (2015)Google Scholar
  26. 26.
    Luo, P., Wang, X., Tang, X.: Pedestrian parsing via deep decompositional network. In: ICCV (2013)Google Scholar
  27. 27.
    Ma, B., Su, Y., Jurie, F.: Covariance descriptor based on bio-inspired features for person re-identification and face verification. IVC 32(6), 379–390 (2014)CrossRefGoogle Scholar
  28. 28.
    Martinel, N., Micheloni, C., Piciarelli, C.: Distributed signature fusion for person re-identification. In: ICDSC (2012)Google Scholar
  29. 29.
    McLaughlin, N., Martinez del Rincon, J., Miller, P.: Recurrent convolutional network for video-based person re-identification. In: CVPR (2016)Google Scholar
  30. 30.
    Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRefGoogle Scholar
  31. 31.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: Theory and practice. IJCV 105(3), 222–245 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Schwartz, W.R., Davis, L.S.: Learning discriminative appearance-based models using partial least squares. In: SIBGRAPI (2009)Google Scholar
  33. 33.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: ACM Multimedia (2007)Google Scholar
  34. 34.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556
  35. 35.
    Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for person re-identification. In: CVPR (2015)Google Scholar
  36. 36.
    Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by video ranking. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 688–703. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10593-2_45 Google Scholar
  37. 37.
    Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88688-4_48 CrossRefGoogle Scholar
  38. 38.
    Wu, L., Shen, C., Hengel, A.V.D.: Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. arXiv preprint (2016). arXiv:1606.01609
  39. 39.
    Xiong, F., Gou, M., Camps, O., Sznaier, M.: Person re-identification using kernel-based metric learning methods. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 1–16. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_1 Google Scholar
  40. 40.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: ICPR, pp. 34–39 (2014)Google Scholar
  41. 41.
    You, J., Wu, A., Li, X., Zheng, W.S.: Top-push video-based person re-identification. In: CVPR (2016)Google Scholar
  42. 42.
    Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: CVPR (2013)Google Scholar
  43. 43.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: CVPR (2015)Google Scholar
  44. 44.
    Zheng, L., Wang, S., Liu, Z., Tian, Q.: Lp-norm idf for large scale image search. In: CVPR (2013)Google Scholar
  45. 45.
    Zheng, L., Wang, S., Tian, L., He, F., Liu, Z., Tian, Q.: Query-adaptive late fusion for image search and person re-identification. In: CVPR (2015)Google Scholar
  46. 46.
    Zheng, L., Zhang, H., Sun, S., Chandraker, M., Tian, Q.: Person re-identification in the wild. arXiv preprint (2016). arXiv:1604.02531
  47. 47.
    Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: BMVC, vol. 2, p. 6 (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Liang Zheng
    • 1
    • 3
  • Zhi Bie
    • 1
  • Yifan Sun
    • 1
  • Jingdong Wang
    • 2
  • Chi Su
    • 4
  • Shengjin Wang
    • 1
    Email author
  • Qi Tian
    • 3
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.Microsoft ResearchBeijingChina
  3. 3.UTSASan AntonioUSA
  4. 4.Peking UniversityBeijingChina

Personalised recommendations