Pedestrian attribute recognition based on attribute correlation

Zhao, Ruijie; Lang, Congyan; Li, Zun; Liang, Liqian; Wei, Lili; Feng, Songhe; Wang, Tao

doi:10.1007/s00530-022-00893-y

Pedestrian attribute recognition based on attribute correlation

Regular Paper
Published: 14 February 2022

Volume 28, pages 1069–1081, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Ruijie Zhao¹,
Congyan Lang¹,
Zun Li¹,
Liqian Liang¹,
Lili Wei¹,
Songhe Feng¹ &
…
Tao Wang¹

494 Accesses
4 Citations
Explore all metrics

Abstract

Pedestrian attribute recognition is widely used in pedestrian tracking and pedestrian re-identification. This task confronts two fundamental challenges. One comes from its multi-label nature; the other one comes from the characteristics of data samples, such as the class imbalance and the partial occlusion. In this work, we propose a Cross Attribute and Feature Network (CAFN) that fully exploits the correlations between any pair of attributes for the pedestrian attribute recognition to tackle these challenges. Concretely, CAFN contains two modules: Cross-attribute Attention Module (C2AM) and Cross-feature Attention Module (CFAM). C2AM enables the network to automatically learn the relation matrix during the training process which can quantify the correlations between any pair of attributes in the attribute set, and CFAM is introduced to fuse different attribute features to generate more accurate and robust attribute features. Extensive experiments demonstrate that the proposed CAFN performs favorably compared with state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Exploring Plain Vision Transformer Backbones for Object Detection

References

Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3800–3808 (2017)
Tay, C.-P., Roy, S., Yap, K.-H.: Aanet: Attribute attention network for person re-identifications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7134–7143 (2019)
Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Hu, Z., Yan, C., Yang, Y.: Improving person re-identification by attribute and identity learning. Pattern Recogn. 95, 151–161 (2019)
Article Google Scholar
Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 111–115. IEEE (2015)
Sudowe, P., Spitzer, H., Leibe, B.: Person attribute recognition with a jointly-trained holistic cnn model. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 87–95 (2015)
Zhu, J., Liao, S., Yi, D., Lei, Z., Li, S.Z.: Multi-label cnn based pedestrian attribute learning for soft biometrics. In: 2015 International Conference on Biometrics (ICB), pp. 535–540. IEEE (2015)
Joo, J., Wang, S., Zhu, S.-C.: Human attribute recognition by rich appearance dictionary. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 721–728 (2013)
Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2470–2478 (2015)
Diba, A., Pazandeh, A.M., Pirsiavash, H., Van Gool, L.: Deepcamp: Deep convolutional action & attribute mid-level patterns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3557–3565 (2016)
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: Panda: Pose aligned networks for deep attribute modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1644 (2014)
Li, D., Chen, X., Zhang, Z., Huang, K.: Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)
Yang, L., Zhu, L., Wei, Y., Liang, S., Tan, P.: Attribute recognition from adaptive parts. arXiv preprint arXiv:1607.01437 (2016)
Li, Y., Huang, C., Loy, C.C., Tang, X.: Human attribute recognition by deep hierarchical contexts. In: European Conference on Computer Vision, pp. 684–700. Springer (2016)
Liu, P., Liu, X., Yan, J., Shao, J.: Localization guided learning for pedestrian attribute recognition. arXiv preprint arXiv:1808.09102 (2018)
Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J., Wang, X.: Hydraplus-net: Attentive deep features for pedestrian analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 350–359 (2017)
Sarafianos, N., Xu, X., Kakadiaris, I.A.: Deep imbalanced attribute classification using visual attention aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 680–697 (2018)
Sarfraz, M.S., Schumann, A., Wang, Y., Stiefelhagen, R.: Deep view-sensitive pedestrian attribute inference in an end-to-end model. arXiv preprint arXiv:1707.06089 (2017)
Guo, H., Fan, X., Wang, S.: Human attribute recognition by refining attention heat map. Pattern Recogn. Lett. 94, 38–45 (2017)
Article Google Scholar
Bourdev, L., Maji, S., Malik, J.: Describing people: A poselet-based approach to attribute classification. In: 2011 International Conference on Computer Vision, pp. 1543–1550. IEEE (2011)
Wang, J., Zhu, X., Gong, S., Li, W.: Attribute recognition by joint recurrent learning of context and correlation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 531–540 (2017)
Zhao, X., Sang, L., Ding, G., Guo, Y., Jin, X.: Grouping attribute recognition for pedestrian with joint recurrent learning. In: IJCAI, pp. 3177–3183 (2018)
Liu, H., Wu, J., Jiang, J., Qi, M., Ren, B.: Sequence-based person attribute recognition with joint ctc-attention model. arXiv preprint arXiv:1811.08115 (2018)
Zhao, X., Sang, L., Ding, G., Han, J., Di, N., Yan, C.: Recurrent attention model for pedestrian attribute recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9275–9282 (2019)
Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 789–792 (2014)
Li, D., Zhang, Z., Chen, X., Ling, H., Huang, K.: A richly annotated dataset for pedestrian attribute recognition. arXiv preprint arXiv:1603.07054 (2016)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database (2014)
Zhang, J., Ren, P., Li, J.: Deep template matching for pedestrian attribute recognition with the auxiliary supervision of attribute-wise keypoints. arXiv preprint arXiv:2011.06798 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Ji, Z., Hu, Z., He, E., Han, J., Pang, Y.: Pedestrian attribute recognition based on multiple time steps attention. Pattern Recogn. Lett. 138, 170–176 (2020)
Article Google Scholar
Gao, L., Huang, D., Guo, Y., Wang, Y.: Pedestrian attribute recognition via hierarchical multi-task learning and relationship attention. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1340–1348 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer (2015)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks, 2020 ieee. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
KingaD, A.: A methodforstochasticoptimization. Anon. InternationalConferenceon Learning Representations. SanDego: ICLR (2015)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Fabbri, M., Calderara, S., Cucchiara, R.: Generative adversarial models for people attribute recognition in surveillance. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
Zeng, H., Ai, H., Zhuang, Z., Chen, L.: Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities (No. 2020JBM403), the National Natural Science Foundation of China (No. 62072027, No. 61872032), the Beijing Natural Science Foundation (Grants No. 4202057, No. 4202058, No. 4202060).

Author information

Authors and Affiliations

Beijing Jiaotong University, Beijing, 100044, China
Ruijie Zhao, Congyan Lang, Zun Li, Liqian Liang, Lili Wei, Songhe Feng & Tao Wang

Authors

Ruijie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Congyan Lang
View author publications
You can also search for this author in PubMed Google Scholar
Zun Li
View author publications
You can also search for this author in PubMed Google Scholar
Liqian Liang
View author publications
You can also search for this author in PubMed Google Scholar
Lili Wei
View author publications
You can also search for this author in PubMed Google Scholar
Songhe Feng
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Congyan Lang.

Additional information

Communicated by Z. Liu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, R., Lang, C., Li, Z. et al. Pedestrian attribute recognition based on attribute correlation. Multimedia Systems 28, 1069–1081 (2022). https://doi.org/10.1007/s00530-022-00893-y

Download citation

Received: 08 September 2021
Accepted: 12 January 2022
Published: 14 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00530-022-00893-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pedestrian attribute recognition based on attribute correlation

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Exploring Plain Vision Transformer Backbones for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pedestrian attribute recognition based on attribute correlation

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Exploring Plain Vision Transformer Backbones for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation