Abstract
Person re-identification (ReID) aims at identifying a person of interest (POI) across multiple non-overlapping cameras. The POI can be either in an image or in a video sequence. Factors such as occlusion, variable viewpoint, misalignment, unrestrained poses, background clutter are the major challenges in developing robust, person ReID models. To address these issues, an attention mechanism that comprises local part/region-aggregated feature representation learning is presented in this paper by incorporating long-range local and global context modeling. The part-aware local attention blocks are aggregated into the widely used modified pre-trained ResNet50 CNN architecture as a backbone employing two attention blocks, i.e., Spatio-Temporal Attention Module (STAM) and Channel Attention Module (CAM). The spatial attention block of STAM can learn contextual dependencies between different human body parts/regions like head, upper body, lower body, and shoes from a single frame. On the other hand, the temporal attention modality can learn temporal contextual dependencies of the same person’s body parts across all video frames. Lastly, the channel-based attention modality, i.e., CAM, can model semantic connections between the channels of feature maps. These STAM and CAM blocks are combined sequentially to form a unified attention network named as Spatio-Temporal Channel Attention Network (STCANet) that will be able to learn both short-range and long-range global feature maps, respectively. Extensive experiments are carried out to study the effectiveness of STCANet on three image-based and two video-based benchmark datasets, i.e., Market-1501, DukeMTMC-ReID, MSMT17, DukeMTC-VideoReID, and MARS. K-reciprocal re-ranking of gallery set is also applied in which the proposed network showed a significant improvement over these datasets in comparison with state of the art. Lastly, to study the generalizability of STCANet on unseen test instances, cross-validation on external cohorts is also applied that showed the robustness of the proposed model that can be easily deployed to the real world for practical applications.
Similar content being viewed by others
Data availability
The Market-1501 and MARS datasets that support the finding of this study are publicly available in Kaggle, https://www.kaggle.com/datasets/pengcw1/market-1501, https://www.kaggle.com/datasets/twoboysandhats/mars-motion-analysis-andreidentification-set. MSMT17 dataset is not openly available due to privacy concerns and however is available on reasonable request from the provided URL https://www.pkuvmc.com/dataset.html. DukeMTMC and DukeMTMC-VideoReID dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.txt). The references of all data analyzed during this study are included in this study.
References
Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 79–88
Zheng L, Bie Z, Sun Y, Wang J, Chi S, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 868–884
Porikli F (2003) Inter-camera color calibration by correlation model function. In: Proceedings 2003 international conference on image processing (cat. No. 03CH37429). 2. IEEE
Hirzer M, Roth PM, Köstinger M, Bischof H (2012) Relaxed pairwise learned metric for person re-identification. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) European conference on computer vision. Springer, Berlin, Heidelberg, pp 780–793
Koestinger M, Hirzer M, Wohlhart P, Roth PM, Bischof H (2012) Large scale metric learning from equivalence constraints. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 2288–2295. IEEE
Ye M, Liang C, Yu Y, Wang Z, Leng Q, Xiao C, Chen J, Hu R (2016) Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans Multimedia 18(12):2553–2566
Wang G, Lai J, Huang P, Xie X (2019) Spatial-temporal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 33(01): 8933-8940
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE
Wu L, Wang Y, Shao L, Wang M (2019) 3-D PersonVLAD: Learning deep global representations for video-based person reidentification. IEEE Trans Neural Netw Learn Syst 30(11):3347–3359
McLaughlin N, Del Rincon JM, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Yan Y, Ni B, Song Z, Ma C, Yan Y, Yang X (2016) Person re-identification via recurrent feature aggregation. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 701–716
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2020) IAUnet: global context-aware feature learning for person re-identification. arXiv. arXiv, doi: https://doi.org/10.1109/tnnls.2020.3017939.
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable Person re-identification: a benchmark University of Texas at San Antonio,” Iccv, pp. 1116–1124 [Online]. Available: http://www.liangzheng.com.cn
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE international conference on computer vision, pp. 3754–3762
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y (2018) Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5177–5186
Wieczorek M, Rychalska B, Dąbrowski J (2021) On the unreasonable effectiveness of centroids in image retrieval. In: International conference on neural information processing. Springer, Cham
Luo H, Gu Y, Liao X, Lai S, Jiang W (2019) Bag of tricks and a strong baseline for deep person re-identification. [Online]. Available: https://github.com/michuanhaohao/reid-strong-baseline
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2020) Deep learning for person re-identification: a survey and outlook. [Online]. Available: http://arxiv.org/abs/2001.04193
Neff C, Mendieta M, Mohan S, Baharani M, Rogers S, Tabkhi H (2020) REVAMP2T: real-time edge video analytics for multicamera privacy-aware pedestrian tracking. IEEE Internet Things J 7(4):2591–2602. https://doi.org/10.1109/JIOT.2019.2954804
Zhou N-R, Zhang T-F, Xie X-W, Jun-Yun Wu (2023) Hybrid quantum–classical generative adversarial networks for image generation via learning discrete distribution. Signal Process: Image Commun 110:116891
Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S, Gu J (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimedia 22(10):2597–2609
Zeng P, Tong L, Liang Y, Zhou N, Jianhua Wu (2022) Multitask image splicing tampering detection based on attention mechanism. Mathematics 10(20):3852
Li X, Liu Y, Wang K, Yan Y, Wang F-Y (2019) A hybrid of hard and soft attention for person re-identification. In: 2019 Chinese automation congress (CAC), pp. 2433–2438. IEEE
Somers V, De Vleeschouwer C, Alahi A. Body part-based representation learning for occluded person re-identification. arXiv preprint arXiv:2211.03679 (2022)
Gao G et al. (2022) AONet: attentional occlusion-aware network for occluded person re-identification. In: Proceedings of the Asian conference on computer vision
Chen Y et al (2022) Pose-guided counterfactual inference for occluded person re-identification. Image Vis Comput 128:104587
Xia BN et al. (2019) Second-order non-local attention networks for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision
Sun Y et al. (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV)
Chen T et al. (2019) Abd-net: attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision
Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q, Jiang W, Zhang C, Sun J (2017) Alignedreid: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184
Ren M, He L, Liao X, Liu W, Wang Y, Tan T (2021) Learning instance-level spatial-temporal patterns for person re-identification. pp. 14930–14939, [Online]. Available: http://arxiv.org/abs/2108.00171.
Munir, A, Martinel N, Micheloni C (2021) Self and channel attention network for person re-identification. In: 2020 25th international conference on pattern recognition (ICPR). IEEE
Han K et al (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919
Bai Y, Mei J, Yuille A, Xie C (2021) “Are transformers more robust than CNNs?” No NeurIPS, pp. 1–13, [Online]. Available: http://arxiv.org/abs/2111.05464.
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 15013–15022
Li Y, He J, Zhang T, Liu X, Zhang Y, Wu F (2021) Diverse part discovery: occluded person re-identification with part-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2898–2907.
Jin K, Zhai J, Gao Y (2023) TwinsReID: person re-identification based on twins transformer’s multi-level features. Math Biosci Eng 20(2):2110–2130
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1318–1327.
Bai S, Bai X, Tian Q (2017) Scalable person re-identification on supervised smoothed manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
Wang J, Zhou S, Wang J, Hou Q (2018) Deep ranking model by large adaptive margin learning for person re-identification. Pattern Recogn 74:241–252
Wu G, Zhu X, Gong S (2022) Learning hybrid ranking representation for person re-identification. Pattern Recogn 121:108239
Song W, Wu Y, Zheng J, Chen C, Liu F (2019) Extended global-local representation learning for video person re-identification. IEEE Access 7:122684–122696. https://doi.org/10.1109/ACCESS.2019.2937974
Eom C, Lee G, Lee J, Ham B (2021) Video-based person re-identification with spatial and temporal memory networks. pp. 12036–12045. [Online]. Available: http://arxiv.org/abs/2108.09039
Gu X, Chang H, Ma B, Zhang H, Chen X (2020) Appearance-preserving 3d convolution for video-based person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 228–243
Rahman T, Rochan M, Wang Y (2019) Video-based person re-identification using refined attention networks. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE
Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 274–282
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X (2019) Vrstc: occlusion-free video person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7183–7192.
Wang Y, Zhang P, Gao S, Geng X, Lu H, Wang D (2021) Pyramid spatial-temporal aggregation for video-based person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 12026–12035.
Wu Di, Wang C, Yong Wu, Wang Q-C, Huang D-S (2021) Attention deep model with multi-scale deep supervision for person re-identification. IEEE Trans Emerg Topics Comput Intell 5(1):70–78
Ning, J, Li F, Liu R, Takeuchi S, Suzuki G (2022) Temporal extension topology learning for video-based person re-identification. In: Proceedings of the Asian conference on computer vision, pp. 207–219.
Zheng M, Karanam S, Wu Z, Radke RJ (2019) Re-identification with consistent attentive siamese networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5735–5744.
Varior RR, Shuai B, Lu J, Xu D, Wang G (2016) A siamese long short-term memory architecture for human re-identification. In: Leibe B, Matas J, Sebe N, Welling M (eds) European conference on computer vision. Springer, Cham, pp 135–153
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890.
Liang X, Gong Ke, Shen X, Lin L (2018) Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885
Zhang, S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems. 31
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Cambridge
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. [Online]. Available: http://arxiv.org/abs/1703.07737.
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 13001-13008
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Zhong Z, Zheng L, Zheng Z, Li S, Yang Yi (2018) Camstyle: a novel data augmentation method for person re-identification. IEEE Trans Image Process 28(3):1176–1190
Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang YG, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European conference on computer vision (ECCV). pp. 650–667
Adil M, Mamoon S, Zakir A, Manzoor MA, Lian Z (2020) Multi scale-adaptive super-resolution person re-identification using GAN. IEEE Access 8:177351–177362. https://doi.org/10.1109/access.2020.3023594
Zheng F, Deng C, Sun X, Jiang X, Guo X, Yu Z, Huang F, Ji R (2019) Pyramidal person re-identification via multi-loss dynamic training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8514–8522.
Dai Z, Chen M, Gu X, Zhu S, Tan P (2019) Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3691–3701
Zhong S, Bao Z, Gong S, Xia K (2021) Person reidentification based on pose-invariant feature and B-KNN reranking. IEEE Transactions on Comput Soc Syst 8(5):1272–1281
Zhu X, Zhu X, Li M, Morerio P, Murino V, Gong S (2021) Intra-camera supervised person re-identification. Int J Comput Vision 129(5):1580–1595
Zhihui Z, Xinyang J, Feng Z, Xiaowei G, Feiyue H, Weishi Z, Xing S (2019) Viewpoint-aware loss with angular regularization for person re-identification. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA. 27
Park H, Ham B (2020) Relation network for person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, 34(07): 11839-11847.
Tan H, Xiao H, Zhang X, Dai B, Shiming Lai Y, Liu MZ (2020) Msba: multiple scales, branches and attention network with bag of tricks for person re-identification. IEEE Access 8:63632–63642
Aich A, Zheng M, Karanam S, Chen T, Roy-Chowdhury AK, Wu Z (2021) Spatio-temporal representation factorization for video-based person re-identification. pp. 152–162. [Online]. Available: http://arxiv.org/abs/2107.11878.
Hou R, Chang H, Ma B, Shan S, Chen X (2020) Temporal complementary learning for video person re-identification. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) European conference on computer vision. Springer, Cham, pp 388–405
Sun R, Huang Q, Xia M, Zhang J (2018) Video-based person re-identification by an end-to-end learning architecture with hybrid deep appearance-temporal feature. Sensors 18(11):3669
Li P, Pan P, Liu P, Xu M, Yang Y (2021) Hierarchical temporal modeling with mutual distance matching for video based person re-identification. IEEE Trans Circuits Syst Video Technol 31(2):503–511. https://doi.org/10.1109/TCSVT.2020.2988034
Bai S, Bai X (2016) Sparse contextual activation for efficient visual re-ranking. IEEE Trans Image Process 25(3):1056–1069
Liu Y, Lin S, Andy S (2018) Adaptive re-ranking of deep feature for person re-identification. arXiv preprint arXiv:1811.08561.
Saquib SM, Schumann A, Eberle A, Stiefelhagen R (2018) A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 420–429.
Li W, Zhu X, Gong S (2020) Scalable person re-identification by harmonious attention. Int J Comput Vision 128(6):1635–1653
Acknowledgements
We acknowledge partial support from the National Center of Big Data and Cloud Computing (NCBC) and the Higher Education Commission (HEC) of Pakistan for conducting this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no conflict of interest associated with this publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zulfiqar, F., Bajwa, U.I. & Raza, R.H. Multi-camera person re-identification using spatiotemporal context modeling. Neural Comput & Applic 35, 20117–20142 (2023). https://doi.org/10.1007/s00521-023-08799-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08799-0