A spatial and temporal features mixture model with body parts for video-based person re-identification

Liu, Jie; Sun, Cheng; Xu, Xiang; Xu, Baomin; Yu, Shuangyuan

doi:10.1007/s10489-019-01459-8

A spatial and temporal features mixture model with body parts for video-based person re-identification

Published: 12 April 2019

Volume 49, pages 3436–3446, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jie Liu¹,
Cheng Sun²,
Xiang Xu³,
Baomin Xu ORCID: orcid.org/0000-0002-4037-4942¹ &
…
Shuangyuan Yu¹

504 Accesses
19 Citations
Explore all metrics

Abstract

The goal of video-based person re-identification is to recognize a person at different camera settings. Most previous methods use features from the full body to represent a person. In this paper, we propose a novel Spatial and Temporal Features Mixture Model (STFMM). Unlike previous approaches, our model first horizontally splits human body into N parts, which include the information of head, waist, legs and so on. The feature of each part is then integrated in order to achieve more expressive representation for each person. Experiments conducted on the iLIDS-VID and PRID-2011 datasets demonstrate that our approach outperforms the existing video-based person re-identification methods and significantly improves stability. Our model achieves a rank-1 CMC accuracy of 73.6% on the iLIDS-VID dataset and a rank-1 CMC accuracy of 47.8% for the cross-data testing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Tracklet and Signature Representation Using Part Appearance Mixture Approach in the Context of Multi-shot Person Re-Identification

References

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. International conference on learning representations (ICLR)
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res (JMLR) 6(6):937–965
MathSciNet MATH Google Scholar
Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2360–2367
Gray D, Brennan S, Tao H (2007) Evaluating appearance models for recognition, reacquisition, and tracking. In: IEEE international workshop on performance evaluation for tracking and surveillance (PETS), pp 1–7
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1735–1742
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision (ECCV), pp 346–361
Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Scandinavian conference on image analysis (SCIA), pp 91–102
Hirzer M, Roth PM, Stinger M, Bischof H (2012) Relaxed pairwise learned metric for person re-identification. In: European conference on computer vision (ECCV), pp 780–793
Kviatkovsky I, Adam A, Rivlin E (2013) Color invariants for person reidentification. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(7):1622–1634
Article Google Scholar
Li Y, Wu Z, Karanam S, Radke RJ (2015) Multi-shot human re-identification using adaptive fisher discriminant analysis. In: British machine vision conference (BMVC), pp 73.1–73.12
Li Z, Chang S, Liang F, Huang TS, Cao L, Smith JR (2013) Learning locally-adaptive decision functions for person verification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3610–3617
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2197–2206
Liao S, Li SZ (2015) Efficient psd constrained asymmetric metric learning for person re-identification. In: IEEE international conference on computer vision (ICCV), pp 3685–3693
Liu C, Gong S, Chen CL, Lin X (2012) Person re-identification: what features are important?. In: European conference on computer vision (ECCV), pp 391–401
Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for video-based pedestrian re-identification. In: IEEE international conference on computer vision (ICCV), pp 3810–3818
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence (IJCAI), pp 674–679
Ma B, Su Y, Jurie F (2012) Local descriptors encoded by fisher vectors for person re-identification. In: European conference on computer vision (ECCV), pp 413–422
Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical gaussian descriptor for person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1363–1372
McLaughlin N, Rincon JMD, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1325–1334
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Advances in neural information processing systems (NIPS), pp 2204–2212
Paisitkriangkrai S, Shen C, Hengel AVD (2015) Learning to rank in person re-identification with metric ensembles. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1846–1855
Subramaniam A, Chatterjee M, Mittal A (2016) Deep neural networks with inexact matching for person re-identification. In: Advances in neural information processing systems (NIPS), pp 2667–2675
Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: European conference on computer vision (ECCV), pp 135–153
Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: European conference on computer vision (ECCV), pp 688–703
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res (JMLR) 10(2):207–244
MATH Google Scholar
Wu L, Shen C, Hengel A (2016) Deep recurrent convolutional networks for video-based person re-identification: an end-to-end approach. IEEE conference on computer vision and pattern recognition (CVPR)
Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: European conference on computer vision (ECCV), pp 1–16
Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. IEEE International Conference on Computer Vision (ICCV), pp 4743–4752
Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: European conference on computer vision (ECCV), pp 701–716
Yi D, Lei Z, Liao S, Li SZ (2014) Deep metric learning for person re-identification. In: International conference on pattern recognition (ICPR), pp 34–39
Zhang Z, Chen Y, Saligrama V (2015) Group membership prediction. In: IEEE International conference on computer vision (ICCV), pp 3916–3924
Zhao R, Ouyang W, Wang X (2014) Learning mid-level filters for person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 144–151
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: IEEE international conference on computer vision (ICCV), pp 1116–1124
Zheng WS, Gong S, Xiang T (2013) Reidentification by relative distance comparison. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(3):653–668
Article Google Scholar
Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6776–6785

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (NSFC 61572005, 61672086, 61702030, 61771058), and Key Projects of Science and Technology Research of Hebei Province Higher Education (ZD2017304).

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, People’s Republic of China
Jie Liu, Baomin Xu & Shuangyuan Yu
School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, 8190395, Japan
Cheng Sun
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Xiang Xu

Authors

Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Baomin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shuangyuan Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baomin Xu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Sun, C., Xu, X. et al. A spatial and temporal features mixture model with body parts for video-based person re-identification. Appl Intell 49, 3436–3446 (2019). https://doi.org/10.1007/s10489-019-01459-8

Download citation

Published: 12 April 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10489-019-01459-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A spatial and temporal features mixture model with body parts for video-based person re-identification

Abstract

Access this article

Similar content being viewed by others

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Tracklet and Signature Representation Using Part Appearance Mixture Approach in the Context of Multi-shot Person Re-Identification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A spatial and temporal features mixture model with body parts for video-based person re-identification

Abstract

Access this article

Similar content being viewed by others

Combine Coarse and Fine Cues: Multi-grained Fusion Network for Video-Based Person Re-identification

MARS: A Video Benchmark for Large-Scale Person Re-Identification

Tracklet and Signature Representation Using Part Appearance Mixture Approach in the Context of Multi-shot Person Re-Identification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation