Multimedia Tools and Applications

, Volume 78, Issue 2, pp 1583–1611 | Cite as

Deep salient-Gaussian Fisher vector encoding of the spatio-temporal trajectory structures for person re-identification

  • Salma KsibiEmail author
  • Mahmoud Mejdoub
  • Chokri Ben Amar


In this paper, we propose a deep spatio-temporal appearance (DSTA) descriptor for person re-identification (re-ID). The proposed descriptor is based on the deep Fisher vector (FV) encoding of the trajectory spatio-temporal structures. These have the advantage of robustly handling the misalignment in the pedestrian tracklets. The deep encoding exploits the richness of the spatio-temporal structural information around the trajectories. This is achieved by hierarchically encoding the trajectory structures leveraging a larger tracklet neighborhood scale when moving from one layer to the next one. In order to eliminate the noisy background located around the pedestrian and model the uniqueness of its identity, the deep FV encoder is further enriched towards the deep Salient-Gaussian weighted FV (deepSGFV) encoder by integrating the pedestrian Gaussian and saliency templates in the encoding process, respectively. The proposed descriptor produces competitive accuracy with respect to state-of-the art methods and especially the deep CNN ones without necessitating either pre-training or data augmentation on four challenging pedestrian video datasets: PRID2011, i-LIDS-VID, Mars and LPW. The further combination of DSTA with deep CNN boosts the current state-of-the-art methods and demonstrates their complementarity.


Person re-identification Deep weighted encoding Spatio-temporal trajectory structures Deep spatio-temporal appearance descriptor Deep CNN 



  1. 1.
    Bedagkar-Gala A, Shah SK (2011) Multiple person re-identification using part based spatio-temporal color appearance model. In: IEEE international conference on computer vision workshops (ICCV), pp 1721–1728Google Scholar
  2. 2.
    Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: The IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  3. 3.
    Chinnasamy GMG (2015) Segmentation of pedestrian video using thresholding algorithm and its parameter analysis. In: International journal of applied research, vol 1, pp 43–46Google Scholar
  4. 4.
    de Avila SEF, Thome N, Cord M, Valle E, de Albuquerque Araújo A (2011) BOSSA: extended bow formalism for image classification. In: 18th IEEE international conference on image processing (ICIP), pp 2909–2912Google Scholar
  5. 5.
    Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: The twenty-third IEEE conference on computer vision and pattern recognition, CVPR, pp 2360–2367Google Scholar
  6. 6.
    Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: 13th Scandinavian conference on image analysis (SCIA), pp 363–370Google Scholar
  7. 7.
    Farquhar J, Szedmak S, Meng H, Taylor JS (2005) Improving bag-of-keypoints image categorisation generative models and pdf-kernels. ReportGoogle Scholar
  8. 8.
    Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322CrossRefGoogle Scholar
  9. 9.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778Google Scholar
  10. 10.
    Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. CoRR arXiv:1703.07737
  11. 11.
    Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: 17th Scandinavian conference on image analysis (SCIA), pp 91–102Google Scholar
  12. 12.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML, JMLR workshop and conference proceedings, vol 37, pp 448–456Google Scholar
  13. 13.
    Jobson D J, Rahman Z, Woodell G A (1997) A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans Image Process 6(7):965–976CrossRefGoogle Scholar
  14. 14.
    Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: Proceedings of the British machine vision conference (BMVC), pp 1–10Google Scholar
  15. 15.
    Kȯstinger M, Hirzer M, Wohlhart P, Roth P M, Bischof H (2012) Large scale metric learning from equivalence constraints. In: 2012 IEEE conference on computer vision and pattern recognition. Providence, pp 2288–2295Google Scholar
  16. 16.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems (NIPS), pp 1097–1105Google Scholar
  17. 17.
    Ksibi S, Mejdoub M, Ben Amar C (2016) Extended fisher vector encoding for person re-identification. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC), pp 4344–4349Google Scholar
  18. 18.
    Ksibi S, Mejdoub M, Ben Amar C (2016) Person re-identification based on combined gaussian weighted fisher vectors. In: 13th IEEE/ACS international conference of computer systems and applications (AICCSA), pp 1–8Google Scholar
  19. 19.
    Ksibi S, Mejdoub M, Ben Amar C (2016) Topological weighted fisher vectors for person re-identification. In: 23rd international conference on pattern recognition (ICPR), pp 3097–3102Google Scholar
  20. 20.
    Ksibi S, Mejdoub M, Ben Amar C (2018) Supervised person re-id based on deep hand-crafted and cnn features. In: International conference on computer vision theory and applications.
  21. 21.
    Kuo CH, Khamis S, Shet VD (2013) Person re-identification using semantic color names and rankboost. In: IEEE workshop on applications of computer vision, pp 281–287Google Scholar
  22. 22.
    Li Z, Chang S, Liang F, Huang T S, Cao L, Smith J R (2013) Learning locally-adaptive decision functions for person verification. In: 2013 IEEE conference on computer vision and pattern recognition. Portland, 3610–3617Google Scholar
  23. 23.
    Liao S, Hu Y, Zhu X, Li S Z (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition, CVPR 2015. Boston, pp 2197–2206Google Scholar
  24. 24.
    Lin Y, Zheng L, Zheng Z, Wu Y, Yang Y (2017) Improving person re-identification by attribute and identity learning. CoRR arXiv:1703.07220
  25. 25.
    Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for video-based pedestrian re-identification. In: IEEE international conference on computer vision (ICCV), pp 3810–3818Google Scholar
  26. 26.
    Ma B, Su Y, Jurie F (2012) Local descriptors encoded by fisher vectors for person re-identification. In: ECCV workshops, vol 7583, pp 413–422Google Scholar
  27. 27.
    Ma B, Su Y, Jurie F (2014) Covariance descriptor based on bio-inspired features for person re-identification and face verification. Image Vis Comput 32(6-7):379–390CrossRefGoogle Scholar
  28. 28.
    McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  29. 29.
    Mejdoub M, Ksibi S, Ben Amar C, Koubaa M (2017) Person re-id while crossing different cameras: Combination of salient-gaussian weighted bossanova and fisher vector encodings. In: International journal of advanced computer science and applications (ijacsa), vol 8, pp 399–410Google Scholar
  30. 30.
    Messelodi S, Modena C M (2015) Boosting fisher vector based scoring functions for person re-identification. Image Vis Comput 44:44–58CrossRefGoogle Scholar
  31. 31.
    Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Applications (VISAPP), pp 331–340Google Scholar
  32. 32.
    Othmani M, Bellil W, Ben Amar C, Alimi AM (2010) A new structure and training procedure for multi-mother wavelet networks. IJWMIP 8(1):149–175. MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Sapienza M, Cuzzolin F, Torr P H S (2014) Learning discriminative space-time action parts from weakly labelled videos. Int J Comput Vis 110(1):30–47CrossRefGoogle Scholar
  34. 34.
    Song G, Leng B, Liu Y, Hetang C, Cai S (2017) Region-based quality estimation network for large-scale person re-identification. CoRR arXiv:1711.08766
  35. 35.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition (CVPR)Google Scholar
  36. 36.
    Wali A, Ben Aoun N, Karray H, Ben Amar C, Alimi AM (2010) A new system for event detection from video surveillance sequences. In: Advanced concepts for intelligent vision systems - 12th international conference, ACIVS 2010, Sydney, Australia, December 13-16, 2010, Proceedings, Part II, pp 110–120,
  37. 37.
    Wang H, Klȧser A, Schmid C, Liu C (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79MathSciNetCrossRefGoogle Scholar
  38. 38.
    Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: 13th European conference on computer vision (ECCV), pp 688–703Google Scholar
  39. 39.
    Xiong F, Gou M, Camps OI, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: The 13th European conference on computer vision (ECCV), pp 1–16Google Scholar
  40. 40.
    Xu Y, Ma B, Huang R, Lin L (2014) Person search in a scene by jointly modeling people commonness and person uniqueness. In: Proceedings of the ACM international conference on multimedia, pp 937–940Google Scholar
  41. 41.
    Yi D, Lei Z, Li S Z (2014) Deep metric learning for practical person re-identification. CoRR arXiv:1407.4979
  42. 42.
    Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1239–1248Google Scholar
  43. 43.
    Zhang W, Hu S, Liu K (2017) Learning compact appearance representation for video-based person re-identification. CoRR arXiv:1702.06294
  44. 44.
    Zhao R, Ouyang W, Wang X (2013) Unsupervised salience learning for person re-identification. In: IEEE conference on computer vision and pattern recognition, pp 3586–3593Google Scholar
  45. 45.
    Zheng L, Shen L, Tian L, Wang S, Bu J, Tian Q (2015) Person re-identification meets image search. In: CoRR, arXiv:1502.02171, pp 2360–2367
  46. 46.
    Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: European conference on computer vision (ECCV)Google Scholar
  47. 47.
    Zheng L, Zhang H, Sun S, Chandraker M, Tian Q (2016) Person re-identification in the wild. CoRR arXiv:1604.02531
  48. 48.
    Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. CoRR arXiv:1701.08398
  49. 49.
    Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Salma Ksibi
    • 1
    Email author
  • Mahmoud Mejdoub
    • 1
  • Chokri Ben Amar
    • 1
  1. 1.REGIM: Research Groups on Intelligent MachinesUniversity of Sfax, ENISSfaxTunisia

Personalised recommendations