Skip to main content
Log in

Multi-camera person re-identification using spatiotemporal context modeling

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Person re-identification (ReID) aims at identifying a person of interest (POI) across multiple non-overlapping cameras. The POI can be either in an image or in a video sequence. Factors such as occlusion, variable viewpoint, misalignment, unrestrained poses, background clutter are the major challenges in developing robust, person ReID models. To address these issues, an attention mechanism that comprises local part/region-aggregated feature representation learning is presented in this paper by incorporating long-range local and global context modeling. The part-aware local attention blocks are aggregated into the widely used modified pre-trained ResNet50 CNN architecture as a backbone employing two attention blocks, i.e., Spatio-Temporal Attention Module (STAM) and Channel Attention Module (CAM). The spatial attention block of STAM can learn contextual dependencies between different human body parts/regions like head, upper body, lower body, and shoes from a single frame. On the other hand, the temporal attention modality can learn temporal contextual dependencies of the same person’s body parts across all video frames. Lastly, the channel-based attention modality, i.e., CAM, can model semantic connections between the channels of feature maps. These STAM and CAM blocks are combined sequentially to form a unified attention network named as Spatio-Temporal Channel Attention Network (STCANet) that will be able to learn both short-range and long-range global feature maps, respectively. Extensive experiments are carried out to study the effectiveness of STCANet on three image-based and two video-based benchmark datasets, i.e., Market-1501, DukeMTMC-ReID, MSMT17, DukeMTC-VideoReID, and MARS. K-reciprocal re-ranking of gallery set is also applied in which the proposed network showed a significant improvement over these datasets in comparison with state of the art. Lastly, to study the generalizability of STCANet on unseen test instances, cross-validation on external cohorts is also applied that showed the robustness of the proposed model that can be easily deployed to the real world for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The Market-1501 and MARS datasets that support the finding of this study are publicly available in Kaggle, https://www.kaggle.com/datasets/pengcw1/market-1501, https://www.kaggle.com/datasets/twoboysandhats/mars-motion-analysis-andreidentification-set. MSMT17 dataset is not openly available due to privacy concerns and however is available on reasonable request from the provided URL https://www.pkuvmc.com/dataset.html. DukeMTMC and DukeMTMC-VideoReID dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.txt). The references of all data analyzed during this study are included in this study.

References

  1. Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 79–88

  2. Zheng L, Bie Z, Sun Y, Wang J, Chi S, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 868–884

    Google Scholar 

  3. Porikli F (2003) Inter-camera color calibration by correlation model function. In: Proceedings 2003 international conference on image processing (cat. No. 03CH37429). 2. IEEE

  4. Hirzer M, Roth PM, Köstinger M, Bischof H (2012) Relaxed pairwise learned metric for person re-identification. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) European conference on computer vision. Springer, Berlin, Heidelberg, pp 780–793

    Google Scholar 

  5. Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 2288–2295. IEEE

  6. Ye M, Liang C, Yu Y, Wang Z, Leng Q, Xiao C, Chen J, Hu R (2016) Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans Multimedia 18(12):2553–2566

    Article  Google Scholar 

  7. Wang G, Lai J, Huang P, Xie X (2019) Spatial-temporal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 33(01): 8933-8940

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  9. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE

  10. Wu L, Wang Y, Shao L, Wang M (2019) 3-D PersonVLAD: Learning deep global representations for video-based person reidentification. IEEE Trans Neural Netw Learn Syst 30(11):3347–3359

    Article  Google Scholar 

  11. McLaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  12. Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 701–716

    Google Scholar 

  13. Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2020) IAUnet: global context-aware feature learning for person re-identification. arXiv. arXiv, doi: https://doi.org/10.1109/tnnls.2020.3017939.

  14. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person re-identification: a benchmark University of Texas at San Antonio,” Iccv, pp. 1116–1124 [Online]. Available: http://www.liangzheng.com.cn

  15. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE international conference on computer vision, pp. 3754–3762

  16. Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5177–5186

  17. Wieczorek M, Rychalska B, Dąbrowski J (2021) On the unreasonable effectiveness of centroids in image retrieval. In: International conference on neural information processing. Springer, Cham

  18. Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. [Online]. Available: https://github.com/michuanhaohao/reid-strong-baseline

  19. Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2020) Deep learning for person re-identification: a survey and outlook. [Online]. Available: http://arxiv.org/abs/2001.04193

  20. Neff C, Mendieta M, Mohan S, Baharani M, Rogers S, Tabkhi H (2020) REVAMP2T: real-time edge video analytics for multicamera privacy-aware pedestrian tracking. IEEE Internet Things J 7(4):2591–2602. https://doi.org/10.1109/JIOT.2019.2954804

    Article  Google Scholar 

  21. Zhou N-R, Zhang T-F, Xie X-W, Jun-Yun Wu (2023) Hybrid quantum–classical generative adversarial networks for image generation via learning discrete distribution. Signal Process: Image Commun 110:116891

    Google Scholar 

  22. Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S, Gu J (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimedia 22(10):2597–2609

    Article  Google Scholar 

  23. Zeng P, Tong L, Liang Y, Zhou N, Jianhua Wu (2022) Multitask image splicing tampering detection based on attention mechanism. Mathematics 10(20):3852

    Article  Google Scholar 

  24. Li X, Liu Y, Wang K, Yan Y, Wang F-Y (2019) A hybrid of hard and soft attention for person re-identification. In: 2019 Chinese automation congress (CAC), pp. 2433–2438. IEEE

  25. Somers V, De Vleeschouwer C, Alahi A. Body part-based representation learning for occluded person re-identification. arXiv preprint arXiv:2211.03679 (2022)

  26. Gao G et al. (2022) AONet: attentional occlusion-aware network for occluded person re-identification. In: Proceedings of the Asian conference on computer vision

  27. Chen Y et al (2022) Pose-guided counterfactual inference for occluded person re-identification. Image Vis Comput 128:104587

    Article  Google Scholar 

  28. Xia BN et al. (2019) Second-order non-local attention networks for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision

  29. Sun Y et al. (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV)

  30. Chen T et al. (2019) Abd-net: attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision

  31. Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184

  32. Ren M, He L, Liao X, Liu W, Wang Y, Tan T (2021) Learning instance-level spatial-temporal patterns for person re-identification. pp. 14930–14939, [Online]. Available: http://arxiv.org/abs/2108.00171.

  33. Munir, A, Martinel N, Micheloni C (2021) Self and channel attention network for person re-identification. In: 2020 25th international conference on pattern recognition (ICPR). IEEE

  34. Han K et al (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919

    Google Scholar 

  35. Bai Y, Mei J, Yuille A, Xie C (2021) “Are transformers more robust than CNNs?” No NeurIPS, pp. 1–13, [Online]. Available: http://arxiv.org/abs/2111.05464.

  36. He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 15013–15022

  37. Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2898–2907.

  38. Jin K, Zhai J, Gao Y (2023) TwinsReID: person re-identification based on twins transformer’s multi-level features. Math Biosci Eng 20(2):2110–2130

    Article  Google Scholar 

  39. Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1318–1327.

  40. Bai S, Bai X, Tian Q (2017) Scalable person re-identification on supervised smoothed manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition.

  41. Wang J, Zhou S, Wang J, Hou Q (2018) Deep ranking model by large adaptive margin learning for person re-identification. Pattern Recogn 74:241–252

    Article  Google Scholar 

  42. Wu G, Zhu X, Gong S (2022) Learning hybrid ranking representation for person re-identification. Pattern Recogn 121:108239

    Article  Google Scholar 

  43. Song W, Wu Y, Zheng J, Chen C, Liu F (2019) Extended global-local representation learning for video person re-identification. IEEE Access 7:122684–122696. https://doi.org/10.1109/ACCESS.2019.2937974

    Article  Google Scholar 

  44. Eom C, Lee G, Lee J, Ham B (2021) Video-based person re-identification with spatial and temporal memory networks. pp. 12036–12045. [Online]. Available: http://arxiv.org/abs/2108.09039

  45. Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 228–243

    Google Scholar 

  46. Rahman T, Rochan M, Wang Y (2019) Video-based person re-identification using refined attention networks. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE

  47. Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 274–282

  48. Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) Vrstc: occlusion-free video person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192.

  49. Wang Y, Zhang P, Gao S, Geng X, Lu H, Wang D (2021) Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 12026–12035.

  50. Wu Di, Wang C, Yong Wu, Wang Q-C, Huang D-S (2021) Attention deep model with multi-scale deep supervision for person re-identification. IEEE Trans Emerg Topics Comput Intell 5(1):70–78

    Article  Google Scholar 

  51. Ning, J, Li F, Liu R, Takeuchi S, Suzuki G (2022) Temporal extension topology learning for video-based person re-identification. In: Proceedings of the Asian conference on computer vision, pp. 207–219.

  52. Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5735–5744.

  53. Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 135–153

    Google Scholar 

  54. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890.

  55. Liang X, Gong Ke, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885

    Article  Google Scholar 

  56. Zhang, S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  57. Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems. 31

  58. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Cambridge

    MATH  Google Scholar 

  59. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. [Online]. Available: http://arxiv.org/abs/1703.07737.

  60. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 13001-13008

  61. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  62. Zhong Z, Zheng L, Zheng Z, Li S, Yang Yi (2018) Camstyle: a novel data augmentation method for person re-identification. IEEE Trans Image Process 28(3):1176–1190

    Article  MathSciNet  Google Scholar 

  63. Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang YG, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European conference on computer vision (ECCV). pp. 650–667

  64. Adil M, Mamoon S, Zakir A, Manzoor MA, Lian Z (2020) Multi scale-adaptive super-resolution person re-identification using GAN. IEEE Access 8:177351–177362. https://doi.org/10.1109/access.2020.3023594

    Article  Google Scholar 

  65. Zheng F, Deng C, Sun X, Jiang X, Guo X, Yu Z, Huang F, Ji R (2019) Pyramidal person re-identification via multi-loss dynamic training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8514–8522.

  66. Dai Z, Chen M, Gu X, Zhu S, Tan P (2019) Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3691–3701

  67. Zhong S, Bao Z, Gong S, Xia K (2021) Person reidentification based on pose-invariant feature and B-KNN reranking. IEEE Transactions on Comput Soc Syst 8(5):1272–1281

    Article  Google Scholar 

  68. Zhu X, Zhu X, Li M, Morerio P, Murino V, Gong S (2021) Intra-camera supervised person re-identification. Int J Comput Vision 129(5):1580–1595

    Article  Google Scholar 

  69. Zhihui Z, Xinyang J, Feng Z, Xiaowei G, Feiyue H, Weishi Z, Xing S (2019) Viewpoint-aware loss with angular regularization for person re-identification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA. 27

  70. Park H, Ham B (2020) Relation network for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 11839-11847.

  71. Tan H, Xiao H, Zhang X, Dai B, Shiming Lai Y, Liu MZ (2020) Msba: multiple scales, branches and attention network with bag of tricks for person re-identification. IEEE Access 8:63632–63642

    Article  Google Scholar 

  72. Aich A, Zheng M, Karanam S, Chen T, Roy-Chowdhury AK, Wu Z (2021) Spatio-temporal representation factorization for video-based person re-identification. pp. 152–162. [Online]. Available: http://arxiv.org/abs/2107.11878.

  73. Hou R, Chang H, Ma B, Shan S, Chen X (2020) Temporal complementary learning for video person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 388–405

    Google Scholar 

  74. Sun R, Huang Q, Xia M, Zhang J (2018) Video-based person re-identification by an end-to-end learning architecture with hybrid deep appearance-temporal feature. Sensors 18(11):3669

    Article  Google Scholar 

  75. Li P, Pan P, Liu P, Xu M, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circuits Syst Video Technol 31(2):503–511. https://doi.org/10.1109/TCSVT.2020.2988034

    Article  Google Scholar 

  76. Bai S, Bai X (2016) Sparse contextual activation for efficient visual re-ranking. IEEE Trans Image Process 25(3):1056–1069

    Article  MathSciNet  MATH  Google Scholar 

  77. Liu Y, Lin S, Andy S (2018) Adaptive re-ranking of deep feature for person re-identification. arXiv preprint arXiv:1811.08561.

  78. Saquib SM, Schumann A, Eberle A, Stiefelhagen R (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 420–429.

  79. Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge partial support from the National Center of Big Data and Cloud Computing (NCBC) and the Higher Education Commission (HEC) of Pakistan for conducting this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Usama Ijaz Bajwa.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest associated with this publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zulfiqar, F., Bajwa, U.I. & Raza, R.H. Multi-camera person re-identification using spatiotemporal context modeling. Neural Comput & Applic 35, 20117–20142 (2023). https://doi.org/10.1007/s00521-023-08799-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08799-0

Keywords

Navigation