Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification

Li, Shuai; Song, Wenfeng; Fang, Zheng; Shi, Jiaying; Hao, Aimin; Zhao, Qinping; Qin, Hong

doi:10.1007/s11263-020-01349-4

Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification

Published: 15 July 2020

Volume 128, pages 2936–2961, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Shuai Li^1,2^na1,
Wenfeng Song ORCID: orcid.org/0000-0002-5101-1071¹^na1,
Zheng Fang¹,
Jiaying Shi¹,
Aimin Hao^1,2,
Qinping Zhao^1,2 &
…
Hong Qin³

966 Accesses
8 Citations
Explore all metrics

A Correction to this article was published on 06 July 2021

This article has been updated

Abstract

Directly benefiting from the rapid advancement of deep learning methods, person re-identification (Re-ID) applications have been widespread with remarkable successes in recent years. Nevertheless, cross-scene Re-ID is still hindered by large view variation, since it is challenging to effectively exploit and leverage the temporal clues due to heavy computational burden and the difficulty in flexibly incorporating discriminative features. To alleviate, we articulate a long-short temporal–spatial clues excited network (LSTS-NET) for robust person Re-ID across different scenes. In essence, our LSTS-NET comprises a motion appearance model and a motion-refinement aggregating scheme. Of which, the former abstracts temporal clues based on multi-range low-rank analysis both in consecutive frames and in cross-camera videos, which can augment the person-related features with details while suppressing the clutter background across different scenes. In addition, to aggregate the temporal clues with spatial features, the latter is proposed to automatically activate the person-specific features by incorporating personalized motion-refinement layers and several motion-excitation CNN blocks into deep networks, which expedites the extraction and learning of discriminative features from different temporal clues. As a result, our LSTS-NET can robustly distinguish persons across different scenes. To verify the improvement of our LSTS-NET, we conduct extensive experiments and make comprehensive evaluations on 8 widely-recognized public benchmarks. All the experiments confirm that, our LSTS-NET can significantly boost the Re-ID performance of existing deep learning methods, and outperforms the state-of-the-art methods in terms of robustness and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Context Aggregation for Video-Based Person Re-Identification

Learning discriminative features with a dual-constrained guided network for video-based person re-identification

Article 08 June 2021

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

Change history

06 July 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11263-021-01497-1

References

Bai, S., Bai, X., & Tian, Q. (2017). Scalable person re-identification on supervised smoothed manifold. In CVPR (pp. 2530–2539).
Bai, S., Tang, P., Torr, P. H., & Latecki, L. J. (2019). Re-ranking via metric fusion for object retrieval and person re-identification. In CVPR (pp. 740–749).
Burr, D. C., & Santoro, L. (2001). Temporal integration of optic flow, measured by contrast and coherence thresholds. Vision Research, 41(15), 1891–1899.
Article Google Scholar
Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? JACM, 58(3), 11.
Article MathSciNet Google Scholar
Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In ICCV (pp. 371–381).
Chen, D., Li, H., Xiao, T., Yi, S., & Wang, X. (2018). Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In CVPR (pp. 1169–1178).
Chen, D., Zhang, S., Ouyang, W., Yang, J., & Tai, Y. (2018). Person search via a mask-guided two-stream CNN model. In ECCV (pp. 734–750).
Dai, J., Zhang, P., Wang, D., Lu, H., & Wang, H. (2019). Video person re-identification by temporal residual learning. TIP, 28(3), 1366–1377.
MathSciNet Google Scholar
Fu, Y., Wang, X., Wei, Y., & Huang, T. S. (2019). Sta: Spatial–temporal attention for large-scale video-based person re-identification. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33(01), pp. 8287–8294).
Fu, Y., Wei, Y., Wang, G., Zhou, Y., Shi, H., & Huang, T. S. (2019). Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification. In ICCV (pp. 6112–6121).
Gu, X., Ma, B., Chang, H., Shan, S., & Chen, X. (2019). Temporal knowledge propagation for image-to-video person re-identification. In ICCV (pp. 9647–9656).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification. In SCIA (pp. 91–102).
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., & Chen, X. (2019). Interaction-and-aggregation network for person re-identification. In CVPR (pp. 9317–9326).
Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).
Huang, H., Li, D., Zhang, Z., Chen, X., & Huang, K. (2018). Adversarially occluded samples for person re-identification. In CVPR (pp. 5098–5107).
Kalayeh, M. M., Basaran, E., Gokmen, M., Kamasak, M. E., & Shah, M. (2018). Human semantic parsing for person re-identification. In CVPR.
Karanam, S., Gou, M., Wu, Z., Rates-Borras, A., Camps, O., & Radke, R. J. (2018). A systematic evaluation and benchmark for person re-identification: Features, metrics, and datasets. In TPAMI (p. 1).
Khan, F. M., & Bremond, F. (2016). Unsupervised data association for metric learning in the context of multi-shot person re-identification. In AVSS (pp. 256–262).
Kodirov, E., Xiang, T., Fu, Z., & Gong, S. (2016). Person re-identification by unsupervised L1 graph learning. In ECCV (pp. 178–195).
Li, D., Chen, X., Zhang, Z., & Huang, K. (2017). Learning deep context-aware features over body and latent parts for person re-identification. In CVPR (pp. 384–393).
Li, J., Wang, J., Tian, Q., Gao, W., & Zhang, S. (2019). Global-local temporal representations for video person re-identification. In ICCV (pp. 3958–3967).
Li, J., Zhang, S., & Huang, T. (2020). Multi-scale temporal cues learning for video person re-identification. IEEE Transactions on Image Processing, 29, 4461–4473.
Article Google Scholar
Li, M., Zhu, X., & Gong, S. (2018). Unsupervised person re-identification by deep learning tracklet association. In ECCV (pp. 737–753).
Li, M., Zhu, X., & Gong, S. (2019). Unsupervised tracklet person re-identification. In TPAMI.
Li, S., Bak, S., Carr, P., & Wang, X. (2018). Diversity regularized spatiotemporal attention for video-based person re-identification. In CVPR (pp. 369–378).
Li, S., Shao, M., & Fu, Y. (2017). Person re-identification by cross-view multi-level dictionary learning. In TPAMI (p. 1).
Li, W., Zhu, X., & Gong, S. (2019a). Scalable person re-identification by harmonious attention. In IJCV (pp. 1–19).
Li, W., Zhu, X., & Gong, S. (2019b). Harmonious attention network for person re-identification. IEEE Access, 7, 22457–22470.
Article Google Scholar
Liao, S., Hu, Y., Zhu, X., & Li, S. Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In CVPR (pp. 2197–2206).
Lin, J., Ren, L., Lu, J., Feng, J., & Zhou, J. (2017). Consistent-aware deep learning for person re-identification in a camera network. In CVPR (pp. 5771–5780).
Liu, H., Feng, J., Jie, Z., Karlekar, J., Zhao, B., Qi, M., et al. (2017). Neural person search machines. In ICCV (pp. 493–501).
Liu, H., Jie, Z., Jayashree, K., Qi, M., Jiang, J., Yan, S., & Feng, J. (2017). Video-based person re-identification with accumulative motion context. TCSVT.
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., & Hu, J. (2018). Pose transferrable person re-identification. In CVPR (pp. 4099–4108).
Liu, K., Ma, B., Zhang, W., & Huang, R. (2015). A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In ICCV (pp. 3810–3818).
Liu, Y., Yan, J., & Ouyang, W. (2017). Quality aware network for set to set recognition. In CVPR (pp. 4694–4703).
Liu, Z., Wang, D., & Lu, H. (2017). Stepwise metric promotion for unsupervised video person re-identification. In ICCV (pp. 2448–2457).
Lv, J., Chen, W., Li, Q., & Yang, C. (2018). Unsupervised cross-dataset person re-identification by transfer learning of spatial–temporal patterns. In CVPR (pp. 7948–7956).
Ma, A. J., Li, J., Yuen, P. C., & Li, P. (2015). Cross-domain person reidentification using domain adaptation ranking SVMS. TIP, 24(5), 1599–1613.
MathSciNet MATH Google Scholar
Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., & Fritz, M. (2018). Disentangled person image generation. In CVPR (pp. 99–108).
Ma, X., Zhu, X., Gong, S., Xie, X., Hu, J., Lam, K.-M., et al. (2017). Person re-identification by unsupervised video matching. PR, 65, 197–210.
Google Scholar
McLaughlin, N., del Rincon, J. M., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In CVPR (pp. 1325–1334).
Miao, J., Wu, Y., Liu, P., Ding, Y., & Yang, Y. (2019). Pose-guided feature alignment for occluded person re-identification. In ICCV (pp. 542–551).
Peng, P., Tian, Y., Xiang, T., Wang, Y., Pontil, M., & Huang, T. (2018). Joint semantic and latent attribute modelling for cross-class transfer learning. TPAMI, 40(7), 1625–1638.
Article Google Scholar
Peng, P., Xiang, T., Wang, Y., Pontil, M., Gong, S., Huang, T., & Tian, Y. (2016). Unsupervised cross-dataset transfer learning for person re-identification. In CVPR (pp. 1306–1315).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In ECCVW.
Sarfraz, M. S., Schumann, A., Eberle, A., & Stiefelhagen, R. (2018). A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In CVPR (pp. 420–429).
Si, J., Zhang, H., Li, C.-G., Kuen, J., Kong, X., Kot, A. C., & Wang, G. (2018). Dual attention matching network for context-aware feature sequence based person re-identification. In CVPR (pp. 5363–5372).
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In NIPS (pp. 568–576).
Song, C., Huang, Y., Ouyang, W., & Wang, L. (2018). Mask-guided contrastive attention model for person re-identification. In CVPR (pp. 1179–1188).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 1929–1958.
MathSciNet MATH Google Scholar
Su, C., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2016). Deep attributes driven multi-camera person re-identification. In ECCV (pp. 475–491).
Subramaniam, A., Nambiar, A., & Mittal, A. (2019). Co-segmentation inspired attention networks for video-based person re-identification. In ICCV (pp. 562–572).
Sun, Y., Xu, Q., Li, Y., Zhang, C., Li, Y., Wang, S., et al. (2019). Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. CVPR (pp. 393–402).
Sun, Y., Zheng, L., Deng, W., & Wang, S. (2017). Svdnet for pedestrian retrieval. In ICCV (pp. 3800–3808).
Tay, C.-P., Roy, S., & Yap, K.-H. (2019). Aanet: Attribute attention network for person re-identifications. In CVPR (pp. 7134–7143).
Tian, M., Yi, S., Li, H., Li, S., Zhang, X., Shi, J., Yan, J., & Wang, X. (2018). Eliminating background-bias for robust person re-identification. In CVPR (pp. 5794–5803).
van der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. JMLR, 15(93), 3221–3245.
MathSciNet MATH Google Scholar
Wang, H., Zhu, X., Gong, S., & Xiang, T. (2018). Person re-identification in identity regression space. IJCV, 126(12), 1288–1310.
Article Google Scholar
Wang, J., Zhu, X., Gong, S., & Li, W. (2018). Transferable joint attribute-identity deep learning for unsupervised person re-identification. In CVPR (pp. 2275–2284).
Wang, T., Gong, S., Zhu, X., & Wang, S. (2014). Person re-identification by video ranking. In ECCV (pp. 688–703).
Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. TPAMI, 38(12), 2501–2514.
Article Google Scholar
Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer Gan to bridge domain gap for person re-identification. In CVPR (pp. 79–88).
Wu, L., Shen, C., & Hengel, A. V. D. (2016). Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. CoRR. arXiv:1606.01609.
Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., & Yang, Y. (2018). Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In CVPR (pp. 5177–5186).
Xia, B. N., Gong, Y., Zhang, Y., & Poellabauer, C. (2019). Second-order non-local attention networks for person re-identification. In ICCV (pp. 3760–3769).
Xiao, T., Li, S., Wang, B., Lin, L., & Wang, X. (2017). Joint detection and identification feature learning for person search. In CVPR (pp. 3376–3385).
Xiong, F., Gou, M., Camps, O., & Sznaier, M. (2014). Person re-identification using kernel-based metric learning methods. In ECCV (pp. 1–16).
Xu, J., Zhao, R., Zhu, F., Wang, H., & Ouyang, W. (2018). Attention-aware compositional network for person re-identification. In CVPR (pp. 2119–2128).
Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., & Zhou, P. (2017). Jointly attentive spatial–temporal pooling networks for video-based person re-identification. In ICCV (pp. 4733–4742).
Yan, Y., Ni, B., Song, Z., Ma, C., Yan, Y., & Yang, X. (2016). Person re-identification via recurrent feature aggregation. In ECCV (pp. 701–716). Berlin: Springer.
Yan, Y., Zhang, Q., Ni, B., Zhang, W., Xu, M., & Yang, X. (2019). Learning context graph for person search. In CVPR (pp. 2158–2167).
Yang, W., Huang, H., Zhang, Z., Chen, X., Huang, K., & Zhang, S. (2019). Towards rich feature discovery with class activation maps augmentation for person re-identification. In CVPR (pp. 1389–1398).
Ye, M., Lan, X., & Yuen, P. C. (2018). Robust anchor embedding for unsupervised video person re-identification in the wild. In ECCV (pp. 170–186).
Ye, M., Ma, A. J., Zheng, L., Li, J., & Yuen, P. C. (2017). Dynamic label graph matching for unsupervised video re-identification. In ICCV (pp. 5152–5160).
Yu, H.-X., Wu, A., & Zheng, W.-S. (2017). Cross-view asymmetric metric learning for unsupervised person re-identification. In ICCV (pp. 994–1002).
Yu, R., Zhou, Z., Bai, S., & Bai, X. (2017). Divide and fuse: A re-ranking approach for person re-identification. In BMVC.
Zeng, Z., Chan, T.-H., Jia, K., & Xu, D. (2012). Finding correspondence from multiple images via sparse and low-rank decomposition. In ECCV (pp. 325–339). Berlin: Springer.
Zhang, W., Hu, S., & Liu, K. (2017). Learning compact appearance representation for video-based person re-identification. arXiv:1702.06294.
Zhang, R., Li, J., Sun, H., Ge, Y., Luo, P., Wang, X., et al. (2019). Scan: Self-and-collaborative attention network for video person re-identification. IEEE Transactions on Image Processing, 28(10), 4870–4882.
Article MathSciNet Google Scholar
Zhang, W., Yu, X., & He, X. (2018). Learning bidirectional temporal cues for video-based person re-identification. CSVT, 28(10), 2768–2776.
Google Scholar
Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR (pp. 6848–6856).
Zhao, H., Tian, M., Sun, S., Shao, J., Yan, J., Yi, S., Wang, X., & Tang, X. (2017). Spindle net: Person re-identification with human body region guided feature decomposition and fusion. In CVPR (pp. 1077–1085).
Zhao, L., Li, X., Zhuang, Y., & Wang, J. (2017a). Deeply-learned part-aligned representations for person re-identification. In CVPR (pp. 3219–3228).
Zhao, R., Ouyang, W., & Wang, X. (2013). Unsupervised salience learning for person re-identification. In CVPR (pp. 3586–3593).
Zhao, R., Ouyang, W., & Wang, X. (2017b). Person re-identification by saliency learning. TPAMI, 39(2), 356–370.
Article Google Scholar
Zhao, Y., Shen, X., Jin, Z., Lu, H., & Hua, X.-s. (2019). Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In CVPR (pp. 4913–4922).
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). Mars: A video benchmark for large-scale person re-identification. In ECCV (pp. 868–884).
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2018). Person re-identification in the wild. In ECCV (pp. 176–193).
Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by Gan improve the person re-identification baseline in vitro. In ICCV (pp. 3774–3782).
Zhong, Z., Zheng, L., Cao, D., & Li, S. (2017a). Re-ranking person re-identification with k-reciprocal encoding. In CVPR (pp. 3652–3661).
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017b). Random erasing data augmentation. In CoRR. arXiv:1708.04896.
Zhong, Z., Zheng, L., Li, S., & Yang, Y. (2018). Generalizing a person retrieval model hetero-and homogeneously. In ECCV (pp. 172–188).
Zhou, S., Wang, F., Huang, Z., & Wang, J. (2019). Discriminative feature learning with consistent attention regularization for person re-identification. In ICCV (pp. 3760–3769).
Zhou, T., & Tao, D. (2011). Godec: Randomized low-rank and sparse matrix decomposition in noisy case. In ICML (pp. 33–40).
Zhou, Z., Huang, Y., Wang, W., Wang, L., & Tan, T. (2017). See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In CVPR (pp. 6776–6785).

Download references

Acknowledgements

National Key R & D Program of China (No. 2018YFB1700603), National Natural Science Foundation of China (NO. 61672077 and 61532002), Beijing Natural Science Foundation Haidian Primitive Innovation Joint Fund (L182016), National Science Foundation of USA under Grant IIS-1715985 and IIS-1812606.

Author information

Shuai Li and Wenfeng Song have contributed equally.

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Shuai Li, Wenfeng Song, Zheng Fang, Jiaying Shi, Aimin Hao & Qinping Zhao
Stony Brook University, Stony Brook, USA
Shuai Li, Aimin Hao & Qinping Zhao
Peng Cheng Laboratory, 518055, Shenzhen, China
Hong Qin

Authors

Shuai Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenfeng Song
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Fang
View author publications
You can also search for this author in PubMed Google Scholar
Jiaying Shi
View author publications
You can also search for this author in PubMed Google Scholar
Aimin Hao
View author publications
You can also search for this author in PubMed Google Scholar
Qinping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wenfeng Song or Hong Qin.

Additional information

Communicated by Patrick Perez.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 4844 KB)

Supplementary material 2 (pdf 231 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Song, W., Fang, Z. et al. Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification. Int J Comput Vis 128, 2936–2961 (2020). https://doi.org/10.1007/s11263-020-01349-4

Download citation

Received: 01 February 2020
Accepted: 17 June 2020
Published: 15 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11263-020-01349-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification

Abstract

Access this article

Similar content being viewed by others

Multi-scale Context Aggregation for Video-Based Person Re-Identification

Learning discriminative features with a dual-constrained guided network for video-based person re-identification

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

Change history

06 July 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 2 (pdf 231 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification

Abstract

Access this article

Similar content being viewed by others

Multi-scale Context Aggregation for Video-Based Person Re-Identification

Learning discriminative features with a dual-constrained guided network for video-based person re-identification

Temporal Correlation-Diversity Representations for Video-Based Person Re-Identification

Change history

06 July 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 2 (pdf 231 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation