Multi-camera person re-identification using spatiotemporal context modeling

Zulfiqar, Fatima; Bajwa, Usama Ijaz; Raza, Rana Hammad

doi:10.1007/s00521-023-08799-0

Multi-camera person re-identification using spatiotemporal context modeling

Original Article
Published: 18 July 2023

Volume 35, pages 20117–20142, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

381 Accesses
Explore all metrics

Abstract

Person re-identification (ReID) aims at identifying a person of interest (POI) across multiple non-overlapping cameras. The POI can be either in an image or in a video sequence. Factors such as occlusion, variable viewpoint, misalignment, unrestrained poses, background clutter are the major challenges in developing robust, person ReID models. To address these issues, an attention mechanism that comprises local part/region-aggregated feature representation learning is presented in this paper by incorporating long-range local and global context modeling. The part-aware local attention blocks are aggregated into the widely used modified pre-trained ResNet50 CNN architecture as a backbone employing two attention blocks, i.e., Spatio-Temporal Attention Module (STAM) and Channel Attention Module (CAM). The spatial attention block of STAM can learn contextual dependencies between different human body parts/regions like head, upper body, lower body, and shoes from a single frame. On the other hand, the temporal attention modality can learn temporal contextual dependencies of the same person’s body parts across all video frames. Lastly, the channel-based attention modality, i.e., CAM, can model semantic connections between the channels of feature maps. These STAM and CAM blocks are combined sequentially to form a unified attention network named as Spatio-Temporal Channel Attention Network (STCANet) that will be able to learn both short-range and long-range global feature maps, respectively. Extensive experiments are carried out to study the effectiveness of STCANet on three image-based and two video-based benchmark datasets, i.e., Market-1501, DukeMTMC-ReID, MSMT17, DukeMTC-VideoReID, and MARS. K-reciprocal re-ranking of gallery set is also applied in which the proposed network showed a significant improvement over these datasets in comparison with state of the art. Lastly, to study the generalizability of STCANet on unseen test instances, cross-validation on external cohorts is also applied that showed the robustness of the proposed model that can be easily deployed to the real world for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Multi-information Constraint Learning for Unsupervised Domain Adaptive Person Re-identification

Article 26 May 2022

A part-based attention network for person re-identification

Article 25 May 2020

Person re-identification based on multi-level feature complementarity of cross-attention with part metric learning

Article 06 May 2020

Data availability

The Market-1501 and MARS datasets that support the finding of this study are publicly available in Kaggle, https://www.kaggle.com/datasets/pengcw1/market-1501, https://www.kaggle.com/datasets/twoboysandhats/mars-motion-analysis-andreidentification-set. MSMT17 dataset is not openly available due to privacy concerns and however is available on reasonable request from the provided URL https://www.pkuvmc.com/dataset.html. DukeMTMC and DukeMTMC-VideoReID dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.txt). The references of all data analyzed during this study are included in this study.

References

Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 79–88
Zheng L, Bie Z, Sun Y, Wang J, Chi S, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 868–884
Google Scholar
Porikli F (2003) Inter-camera color calibration by correlation model function. In: Proceedings 2003 international conference on image processing (cat. No. 03CH37429). 2. IEEE
Hirzer M, Roth PM, Köstinger M, Bischof H (2012) Relaxed pairwise learned metric for person re-identification. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) European conference on computer vision. Springer, Berlin, Heidelberg, pp 780–793
Google Scholar
Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 2288–2295. IEEE
Ye M, Liang C, Yu Y, Wang Z, Leng Q, Xiao C, Chen J, Hu R (2016) Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans Multimedia 18(12):2553–2566
Article Google Scholar
Wang G, Lai J, Huang P, Xie X (2019) Spatial-temporal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 33(01): 8933-8940
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE
Wu L, Wang Y, Shao L, Wang M (2019) 3-D PersonVLAD: Learning deep global representations for video-based person reidentification. IEEE Trans Neural Netw Learn Syst 30(11):3347–3359
Article Google Scholar
McLaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 701–716
Google Scholar
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2020) IAUnet: global context-aware feature learning for person re-identification. arXiv. arXiv, doi: https://doi.org/10.1109/tnnls.2020.3017939.
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person re-identification: a benchmark University of Texas at San Antonio,” Iccv, pp. 1116–1124 [Online]. Available: http://www.liangzheng.com.cn
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE international conference on computer vision, pp. 3754–3762
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5177–5186
Wieczorek M, Rychalska B, Dąbrowski J (2021) On the unreasonable effectiveness of centroids in image retrieval. In: International conference on neural information processing. Springer, Cham
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. [Online]. Available: https://github.com/michuanhaohao/reid-strong-baseline
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2020) Deep learning for person re-identification: a survey and outlook. [Online]. Available: http://arxiv.org/abs/2001.04193
Neff C, Mendieta M, Mohan S, Baharani M, Rogers S, Tabkhi H (2020) REVAMP2T: real-time edge video analytics for multicamera privacy-aware pedestrian tracking. IEEE Internet Things J 7(4):2591–2602. https://doi.org/10.1109/JIOT.2019.2954804
Article Google Scholar
Zhou N-R, Zhang T-F, Xie X-W, Jun-Yun Wu (2023) Hybrid quantum–classical generative adversarial networks for image generation via learning discrete distribution. Signal Process: Image Commun 110:116891
Google Scholar
Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S, Gu J (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimedia 22(10):2597–2609
Article Google Scholar
Zeng P, Tong L, Liang Y, Zhou N, Jianhua Wu (2022) Multitask image splicing tampering detection based on attention mechanism. Mathematics 10(20):3852
Article Google Scholar
Li X, Liu Y, Wang K, Yan Y, Wang F-Y (2019) A hybrid of hard and soft attention for person re-identification. In: 2019 Chinese automation congress (CAC), pp. 2433–2438. IEEE
Somers V, De Vleeschouwer C, Alahi A. Body part-based representation learning for occluded person re-identification. arXiv preprint arXiv:2211.03679 (2022)
Gao G et al. (2022) AONet: attentional occlusion-aware network for occluded person re-identification. In: Proceedings of the Asian conference on computer vision
Chen Y et al (2022) Pose-guided counterfactual inference for occluded person re-identification. Image Vis Comput 128:104587
Article Google Scholar
Xia BN et al. (2019) Second-order non-local attention networks for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision
Sun Y et al. (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV)
Chen T et al. (2019) Abd-net: attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184
Ren M, He L, Liao X, Liu W, Wang Y, Tan T (2021) Learning instance-level spatial-temporal patterns for person re-identification. pp. 14930–14939, [Online]. Available: http://arxiv.org/abs/2108.00171.
Munir, A, Martinel N, Micheloni C (2021) Self and channel attention network for person re-identification. In: 2020 25th international conference on pattern recognition (ICPR). IEEE
Han K et al (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919
Google Scholar
Bai Y, Mei J, Yuille A, Xie C (2021) “Are transformers more robust than CNNs?” No NeurIPS, pp. 1–13, [Online]. Available: http://arxiv.org/abs/2111.05464.
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 15013–15022
Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2898–2907.
Jin K, Zhai J, Gao Y (2023) TwinsReID: person re-identification based on twins transformer’s multi-level features. Math Biosci Eng 20(2):2110–2130
Article Google Scholar
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1318–1327.
Bai S, Bai X, Tian Q (2017) Scalable person re-identification on supervised smoothed manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
Wang J, Zhou S, Wang J, Hou Q (2018) Deep ranking model by large adaptive margin learning for person re-identification. Pattern Recogn 74:241–252
Article Google Scholar
Wu G, Zhu X, Gong S (2022) Learning hybrid ranking representation for person re-identification. Pattern Recogn 121:108239
Article Google Scholar
Song W, Wu Y, Zheng J, Chen C, Liu F (2019) Extended global-local representation learning for video person re-identification. IEEE Access 7:122684–122696. https://doi.org/10.1109/ACCESS.2019.2937974
Article Google Scholar
Eom C, Lee G, Lee J, Ham B (2021) Video-based person re-identification with spatial and temporal memory networks. pp. 12036–12045. [Online]. Available: http://arxiv.org/abs/2108.09039
Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 228–243
Google Scholar
Rahman T, Rochan M, Wang Y (2019) Video-based person re-identification using refined attention networks. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 274–282
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) Vrstc: occlusion-free video person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192.
Wang Y, Zhang P, Gao S, Geng X, Lu H, Wang D (2021) Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 12026–12035.
Wu Di, Wang C, Yong Wu, Wang Q-C, Huang D-S (2021) Attention deep model with multi-scale deep supervision for person re-identification. IEEE Trans Emerg Topics Comput Intell 5(1):70–78
Article Google Scholar
Ning, J, Li F, Liu R, Takeuchi S, Suzuki G (2022) Temporal extension topology learning for video-based person re-identification. In: Proceedings of the Asian conference on computer vision, pp. 207–219.
Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5735–5744.
Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 135–153
Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890.
Liang X, Gong Ke, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885
Article Google Scholar
Zhang, S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems. 31
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Cambridge
MATH Google Scholar
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. [Online]. Available: http://arxiv.org/abs/1703.07737.
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 13001-13008
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Zhong Z, Zheng L, Zheng Z, Li S, Yang Yi (2018) Camstyle: a novel data augmentation method for person re-identification. IEEE Trans Image Process 28(3):1176–1190
Article MathSciNet Google Scholar
Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang YG, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European conference on computer vision (ECCV). pp. 650–667
Adil M, Mamoon S, Zakir A, Manzoor MA, Lian Z (2020) Multi scale-adaptive super-resolution person re-identification using GAN. IEEE Access 8:177351–177362. https://doi.org/10.1109/access.2020.3023594
Article Google Scholar
Zheng F, Deng C, Sun X, Jiang X, Guo X, Yu Z, Huang F, Ji R (2019) Pyramidal person re-identification via multi-loss dynamic training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8514–8522.
Dai Z, Chen M, Gu X, Zhu S, Tan P (2019) Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3691–3701
Zhong S, Bao Z, Gong S, Xia K (2021) Person reidentification based on pose-invariant feature and B-KNN reranking. IEEE Transactions on Comput Soc Syst 8(5):1272–1281
Article Google Scholar
Zhu X, Zhu X, Li M, Morerio P, Murino V, Gong S (2021) Intra-camera supervised person re-identification. Int J Comput Vision 129(5):1580–1595
Article Google Scholar
Zhihui Z, Xinyang J, Feng Z, Xiaowei G, Feiyue H, Weishi Z, Xing S (2019) Viewpoint-aware loss with angular regularization for person re-identification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA. 27
Park H, Ham B (2020) Relation network for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 11839-11847.
Tan H, Xiao H, Zhang X, Dai B, Shiming Lai Y, Liu MZ (2020) Msba: multiple scales, branches and attention network with bag of tricks for person re-identification. IEEE Access 8:63632–63642
Article Google Scholar
Aich A, Zheng M, Karanam S, Chen T, Roy-Chowdhury AK, Wu Z (2021) Spatio-temporal representation factorization for video-based person re-identification. pp. 152–162. [Online]. Available: http://arxiv.org/abs/2107.11878.
Hou R, Chang H, Ma B, Shan S, Chen X (2020) Temporal complementary learning for video person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 388–405
Google Scholar
Sun R, Huang Q, Xia M, Zhang J (2018) Video-based person re-identification by an end-to-end learning architecture with hybrid deep appearance-temporal feature. Sensors 18(11):3669
Article Google Scholar
Li P, Pan P, Liu P, Xu M, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circuits Syst Video Technol 31(2):503–511. https://doi.org/10.1109/TCSVT.2020.2988034
Article Google Scholar
Bai S, Bai X (2016) Sparse contextual activation for efficient visual re-ranking. IEEE Trans Image Process 25(3):1056–1069
Article MathSciNet MATH Google Scholar
Liu Y, Lin S, Andy S (2018) Adaptive re-ranking of deep feature for person re-identification. arXiv preprint arXiv:1811.08561.
Saquib SM, Schumann A, Eberle A, Stiefelhagen R (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 420–429.
Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653
Article Google Scholar

Download references

Acknowledgements

We acknowledge partial support from the National Center of Big Data and Cloud Computing (NCBC) and the Higher Education Commission (HEC) of Pakistan for conducting this research.

Author information

Authors and Affiliations

Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
Fatima Zulfiqar & Usama Ijaz Bajwa
Pakistan Navy Engineering College, National University of Sciences and Technology (NUST), Karachi, Pakistan
Rana Hammad Raza

Authors

Fatima Zulfiqar
View author publications
You can also search for this author in PubMed Google Scholar
Usama Ijaz Bajwa
View author publications
You can also search for this author in PubMed Google Scholar
Rana Hammad Raza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Usama Ijaz Bajwa.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest associated with this publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zulfiqar, F., Bajwa, U.I. & Raza, R.H. Multi-camera person re-identification using spatiotemporal context modeling. Neural Comput & Applic 35, 20117–20142 (2023). https://doi.org/10.1007/s00521-023-08799-0

Download citation

Received: 23 July 2022
Accepted: 28 June 2023
Published: 18 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00521-023-08799-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-camera person re-identification using spatiotemporal context modeling

Abstract

Access this article

Similar content being viewed by others

Multi-information Constraint Learning for Unsupervised Domain Adaptive Person Re-identification

A part-based attention network for person re-identification

Person re-identification based on multi-level feature complementarity of cross-attention with part metric learning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-camera person re-identification using spatiotemporal context modeling

Abstract

Access this article

Similar content being viewed by others

Multi-information Constraint Learning for Unsupervised Domain Adaptive Person Re-identification

A part-based attention network for person re-identification

Person re-identification based on multi-level feature complementarity of cross-attention with part metric learning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation