Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

Yang, Yang; Tan, Zichang; Tiwari, Prayag; Pandey, Hari Mohan; Wan, Jun; Lei, Zhen; Guo, Guodong; Li, Stan Z.

doi:10.1007/s11263-021-01499-z

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

Published: 18 July 2021

Volume 129, pages 2731–2744, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yang Yang ORCID: orcid.org/0000-0003-0559-5464^1,2^na1,
Zichang Tan^4,5^na1,
Prayag Tiwari ORCID: orcid.org/0000-0002-2851-4260⁶,
Hari Mohan Pandey⁷,
Jun Wan^1,2,
Zhen Lei^1,2,3,
Guodong Guo^4,5 &
…
Stan Z. Li¹

937 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Multi-label pedestrian attribute recognition in surveillance is inherently a challenging task due to poor imaging quality, large pose variations, and so on. In this paper, we improve its performance from the following two aspects: (1) We propose a cascaded Split-and-Aggregate Learning (SAL) to capture both the individuality and commonality for all attributes, with one at the feature map level and the other at the feature vector level. For the former, we split the features of each attribute by using a designed attribute-specific attention module (ASAM). For the later, the split features for each attribute are learned by using constrained losses. In both modules, the split features are aggregated by using several convolutional or fully connected layers. (2) We propose a Feature Recombination (FR) that conducts a random shuffle based on the split features over a batch of samples to synthesize more training samples, which spans the potential samples’ variability. To the end, we formulate a unified framework, named CAScaded Split-and-Aggregate Learning with Feature Recombination (CAS-SAL-FR), to learn the above modules jointly and concurrently. Experiments on five popular benchmarks, including RAP, PA-100K, PETA, Market-1501 and Duke attribute datasets, show the proposed CAS-SAL-FR achieves new state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MCFL: multi-label contrastive focal loss for deep imbalanced pedestrian attribute recognition

Article 04 June 2022

Grouping and Recurrent Feature Encoding Based Multi-task Learning for Pedestrian Attribute Recognition

Pedestrian Attribute Recognition with Feature Combination in Transformer with Attention Model

References

Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In ICCV.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009) . Imagenet: A large-scale hierarchical image database. In CVPR.
Deng, Y., Luo, P., Loy, C. C., & Tang, X. (2014). Pedestrian attribute recognition at far distance. In ACM MM.
Dixit, M., Kwitt, R., Niethammer, M., & Vasconcelos, N. (2017). Aga: Attribute-guided augmentation. In CVPR.
Fu, C., Wu, X., Hu, Y., Huang, H., & He, R. (2019). Dual variational generation for low-shot heterogeneous face recognition. In NeurIPS.
Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In CVPR.
Gao, L., Huang, D., Guo, Y., & Wang, Y. (2019). Pedestrian attribute recognition via hierarchical multi-task learning and relationship attention. In ACM MM.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS.
Guo, H., Zheng, K., Fan, X., Yu, H., & Wang, S. (2019). Visual attention consistency under image transforms for multi-label image classification. In CVPR.
Han, K., Wang, Y., Shu, H., Liu, C., Xu, C., & Xu, C. (2019). Attribute aware pooling for pedestrian attribute recognition. In IJCAI.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Hongyi, Z., Moustapha, C., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. In International conference on learning representations.
Hu, J., Shen, L., & Sun, G. (2018a). Squeeze-and-excitation networks. In CVPR (pp. 7132–7141).
Hu, J., Shen, L., & Sun, G. (2018b) . Squeeze-and-excitation networks. In CVPR.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).
Ioffe, S., & Szegedy, C. (2015) . Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.
Jia, J., Huang, H., Yang, W., Chen, X., & Huang, K. (2020). Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method. arXiv preprint arXiv:2005.11909
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM MM.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.
Article Google Scholar
Li, D., Chen, X., & Huang, K. (2015). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR.
Li, D., Chen, X., Zhang, Z., & Huang, K. (2018a). Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In ICME.
Li, D., Zhang, Z., Chen, X., & Huang, K. (2018b). A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. In IEEE TIP.
Li, Q., Zhao, X., He, R., & Huang, K. (2019a). Pedestrian attribute recognition by joint visual-semantic reasoning and knowledge distillation. In IJCAI.
Li, Q., Zhao, X., He, R., & Huang, K. (2019b). Visual-semantic graph reasoning for pedestrian attribute recognition. In AAAI.
Li, W., Zhu, X., & Gong, S. (2018c). Harmonious attention network for person re-identification. In CVPR.
Lim, J. J., Salakhutdinov, R. R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In NeurIPS.
Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Hu, Z., Yan, C., et al. (2019). Improving person re-identification by attribute and identity learning. Pattern Recognition, 95, 151–161.
Article Google Scholar
Liu, B., Wang, X., Dixit, M., Kwitt, R., & Vasconcelos, N. (2018a). Feature space transfer for data augmentation. In CVPR.
Liu, P., Liu, X., Yan, J., & Shao, J.(2018b). Localization guided learning for pedestrian attribute recognition. In BMVC.
Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J., & Wang, X. (2017). Hydraplus-net: Attentive deep features for pedestrian analysis. In ICCV.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikainen, M. (2020). Deep learning for generic object detection: A survey. In IJCV.
Sarafianos, N., Xu, X., & Kakadiaris, I. A. (2018). Deep imbalanced attribute classification using visual attention aggregation. In ECCV.
Sarfraz, M. S., Schumann, A., Wang, Y., & Stiefelhagen, R. (2017). Deep view-sensitive pedestrian attribute inference in an end-to-end model. In BMVC.
Shifeng, Z., Longyin, W., Shi, H., Lei, Z., Lyu, S., & Li, S. Z. (2019). Single-shot scale-aware network for real-time face detection. In IJCV.
Shuzhe, W., Meina, K., Shan, S., & Chen, X. (2019). Hierarchical attention for part-aware face detection. In IJCV.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2018). Efficient group-n encoding and decoding for facial age estimation. In IEEE TPAMI.
Tan, Z., Yang, Y., Wan, J., Guo, G., & Li, S. Z. (2019a). Deeply-learned hybrid representations for facial age estimation. In IJCAI (pp. 3548–3554).
Tan, Z., Yang, Y., Wan, J., Hang, H., Guo, G., & Li, S. Z. (2019b). Attention-based pedestrian attribute analysis. IEEE TIP, 28(12), 6126–6140.
MathSciNet MATH Google Scholar
Tang, C., Sheng, L., Zhang, Z., & Hu, X. (2019c). Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In ICCV.
Wang, J., Zhu, X., Gong, S., & Li, W. (2017). Attribute recognition by joint recurrent learning of context and correlation. In ICCV.
Wang, Y., Gan, W., Wu, W., & Yan, J. (2019). Dynamic curriculum learning for imbalanced data classification. In ICCV.
Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In ECCV.
Wu, M., Huang, D., Guo, Y., & Wang, Y. (2020). Distraction-aware feature learning for human attribute recognition via coarse-to-fine attention mechanism. In AAAI.
Xiang, L., Jin, X., Ding, G., Han, J., & Li, L. (2019). Incremental few-shot learning for pedestrian attribute recognition. In IJCAI.
Xiangyu, Z., Hao, L., & Gong, S. (2020). Scalable person re-identification by harmonious attention. In IJCV.
Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In ICLR.
Zeng, H., Ai, H., Zhuang, Z., & Chen, L. (2020). Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In ICME.
Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., Manmatha, R., & Li, M. (2020). Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955
Zhang, J., Ren, P., & Li, J. (2020) . Deep template matching for pedestrian attribute recognition with the auxiliary supervision of attribute-wise keypoints. arXiv preprint arXiv:2011.06798
Zhao, X., Sang, L., Ding, G., Guo, Y., & Jin, X. (2018). Grouping attribute recognition for pedestrian with joint recurrent learning. In IJCAI.
Zhao, X., Sang, L., Ding, G., Han, J., Di, N., & Yan, C. (2019). Recurrent attention model for pedestrian attribute recognition. In: AAAI.
Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.
Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence.
Zhu, J., Liao, S., Lei, Z., Yi, D., & Li, S. Z. (2013). Pedestrian attribute classification in surveillance: Database and evaluation. In ICCVW.
Zhu, X., Liu, H., Lei, Z., Shi, H., Yang, F., Yi, D., Qi, G., & Li, S. Z. (2019). Large-scale bisample learning on id versus spot face recognition. In IJCV.

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program (No. 2020YFC2003901), the Chinese National Natural Science Foundation Projects #61806203, #61961160704, #61876179, the External cooperation key project of Chinese Academy Sciences #173211KYSB20200002, the Key Project of the General Logistics Department Grant No. AWS17J001, Science and Technology Development Fund of Macau (No. 0010/2019/AFJ, 0025/2019/AKP, 0019/2018/ASC), the Spanish project TIN2016-74946-P (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya. This work was also supported in part by the Academy of Finland (Grants 336033, 315896), Business Finland (Grant 884/31/2018), and EU H2020 (Grant 101016775).

Author information

Yang Yang and Zichang Tan contribute equally to this work.

Authors and Affiliations

Center for Biometrics and Security Research, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yang Yang, Jun Wan, Zhen Lei & Stan Z. Li
School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing, China
Yang Yang, Jun Wan & Zhen Lei
Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hongkong, China
Zhen Lei
Institute of Deep Learning, Baidu Research, Beijing, China
Zichang Tan & Guodong Guo
National Engineering Laboratory for Deep Learning Technology and Application, Beijing, China
Zichang Tan & Guodong Guo
Department of Computer Science, Aalto University, Espoo, Finland
Prayag Tiwari
Department of Computer Science, Edge Hill University, Ormskirk, UK
Hari Mohan Pandey

Authors

Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zichang Tan
View author publications
You can also search for this author in PubMed Google Scholar
Prayag Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Hari Mohan Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wan
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Lei
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Stan Z. Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wan.

Additional information

Communicated by Dima Damen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Tan, Z., Tiwari, P. et al. Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition. Int J Comput Vis 129, 2731–2744 (2021). https://doi.org/10.1007/s11263-021-01499-z

Download citation

Received: 27 June 2020
Accepted: 05 June 2021
Published: 18 July 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11263-021-01499-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

Abstract

Access this article

Similar content being viewed by others

MCFL: multi-label contrastive focal loss for deep imbalanced pedestrian attribute recognition

Grouping and Recurrent Feature Encoding Based Multi-task Learning for Pedestrian Attribute Recognition

Pedestrian Attribute Recognition with Feature Combination in Transformer with Attention Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

Abstract

Access this article

Similar content being viewed by others

MCFL: multi-label contrastive focal loss for deep imbalanced pedestrian attribute recognition

Grouping and Recurrent Feature Encoding Based Multi-task Learning for Pedestrian Attribute Recognition

Pedestrian Attribute Recognition with Feature Combination in Transformer with Attention Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation