Multi-scale Context Aggregation for Video-Based Person Re-Identification

Wu, Lei; Zhang, Canlong; Li, Zhixin; Hu, Liaojie

doi:10.1007/978-981-99-8181-6_8

Lei Wu¹⁰,
Canlong Zhang^10,11,
Zhixin Li^10,11 &
…
Liaojie Hu¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

International Conference on Neural Information Processing

527 Accesses

Abstract

For video-based person re-identification (Re-ID), effectively aggregating video features is the key to dealing with various complicated situations. Different from previous methods that first extracted spatial features and later aggregated temporal features, we propose a Multi-scale Context Aggregation (MSCA) method in this paper to simultaneously learn spatial-temporal features from videos. Specifically, we design an Attention-aided Feature Pyramid Network (AFPN), which can recurrently aggregate detail and semantic information of multi-scale feature maps from the CNN backbone. To enable the aggregation to focus on more salient regions in the video, we embed a particular Spatial-Channel Attention module (SCA) into each layer of the pyramid. To further enhance the feature representations with temporal information while extracting the spatial features, we design a Temporal Enhancement module (TEM), which can plug into each layer of the backbone network in a plug-and-play manner. Comprehensive experiments on three standard video-based person Re-ID benchmarks demonstrate that our method is competitive with most state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Spatial Quality Aware Network for Video-Based Person Re-identification

Large-Scale Video-Based Person Re-identification via Non-local Attention and Feature Erasing

Temporal-Contextual Attention Network for Video-Based Person Re-identification

References

Chen, X., et al.: Salience-guided cascaded suppression network for person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3297–3307 (2020)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv:2010.11929 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2015)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Scandinavian Conference on Image Analysis (2011)
Google Scholar
Hou, R., Chang, H., Ma, B., Huang, R., Shan, S.: BiCnet-TKS: learning efficient spatial-temporal representation for video person re-identification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2014–2023 (2021)
Google Scholar
Hou, R., Chang, H., Ma, B., Shan, S., Chen, X.: Temporal complementary learning for video person re-identification. In: European Conference on Computer Vision (2020)
Google Scholar
Jiang, X., Qiao, Y., Yan, J., Li, Q., Zheng, W., Chen, D.: SSN3D: self-separated network to align parts for 3d convolution in video person re-identification. In: AAAI Conference on Artificial Intelligence (2021)
Google Scholar
Kong, T., Sun, F., bing Huang, W., Liu, H.: Deep feature pyramid reconfiguration for object detection. arxiv:1808.07993 (2018)
Li, J., Wang, J., Tian, Q., Gao, W., Zhang, S.: Global-local temporal representations for video person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3957–3966 (2019)
Google Scholar
Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 369–378 (2018)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 936–944 (2016)
Google Scholar
Liu, J., Zha, Z., Wu, W., Zheng, K., Sun, Q.: Spatial-temporal correlation and topology learning for person re-identification in videos. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4368–4377 (2021)
Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Google Scholar
Liu, X., Zhang, P., Yu, C., Lu, H., Yang, X.: Watching you: global-guided reciprocal learning for video-based person re-identification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13329–13338 (2021)
Google Scholar
Liu, Z., Zhang, L., Yang, Y.: Hierarchical bi-directional feature perception network for person re-identification. In: Proceedings of the 28th ACM International Conference on Multimedia (2020)
Google Scholar
Pan, H., Chen, Y., He, Z.: Multi-granularity graph pooling for video-based person re-identification. Neural Netw. : Off. J. Int. Neural Netw. Soc. 160, 22–33 (2022)
Article Google Scholar
Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision 128, 336–359 (2016)
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: European Conference on Computer Vision (2018)
Google Scholar
Wu, L., Wang, Y., Gao, J., Li, X.: Where-and-when to look: deep Siamese attention networks for video-based person re-identification. IEEE Trans. Multimed. 21, 1412–1424 (2018)
Article Google Scholar
Yan, Y., et al.: Learning multi-granular hypergraphs for video-based person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2896–2905 (2020)
Google Scholar
Zang, X., Li, G., Gao, W.: Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval. IEEE Trans. Industr. Inf. 18(12), 8776–8785 (2022). https://doi.org/10.1109/TII.2022.3151766
Article Google Scholar
Zhang, Z., Lan, C., Zeng, W., Chen, Z.: Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10404–10413 (2020). https://doi.org/10.1109/CVPR42600.2020.01042
Zhao, Q., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Zhao, Y., Shen, X., Jin, Z., Lu, H., Hua, X.: Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4908–4917 (2019)
Google Scholar
Zheng, L., et al.: Mars: a video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (2016)
Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3701–3711 (2019)
Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Nos. 62266009, 62276073, 61966004, 61962007), Guangxi Natural Science Foundation (Nos. 2019GXNSFDA245018, 2018GXNSFDA281009, 2018GXNSFDA294001), Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, Innovation Project of Guangxi Graduate Education (YCSW2023187), and Guangxi “Bagui Scholar” Teams for Innovation and Research Project.

Author information

Authors and Affiliations

Key Lab of Education Blockchain and Intelligent Technology Ministry of Education, Guangxi Normal University, Guilin, China
Lei Wu, Canlong Zhang & Zhixin Li
Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
Canlong Zhang & Zhixin Li
The Experimental High School Attached to Beijing Normal University, Beijing, China
Liaojie Hu

Authors

Lei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Canlong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar
Liaojie Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Canlong Zhang .

Editor information

Editors and Affiliations

Scholl of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, L., Zhang, C., Li, Z., Hu, L. (2024). Multi-scale Context Aggregation for Video-Based Person Re-Identification. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-8181-6_8
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-scale Context Aggregation for Video-Based Person Re-Identification

Abstract

Access this chapter

Similar content being viewed by others

Spatial Quality Aware Network for Video-Based Person Re-identification

Large-Scale Video-Based Person Re-identification via Non-local Attention and Feature Erasing

Temporal-Contextual Attention Network for Video-Based Person Re-identification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-scale Context Aggregation for Video-Based Person Re-Identification

Abstract

Access this chapter

Similar content being viewed by others

Spatial Quality Aware Network for Video-Based Person Re-identification

Large-Scale Video-Based Person Re-identification via Non-local Attention and Feature Erasing

Temporal-Contextual Attention Network for Video-Based Person Re-identification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation