Skip to main content

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

Abstract

Multi-label pedestrian attribute recognition in surveillance is inherently a challenging task due to poor imaging quality, large pose variations, and so on. In this paper, we improve its performance from the following two aspects: (1) We propose a cascaded Split-and-Aggregate Learning (SAL) to capture both the individuality and commonality for all attributes, with one at the feature map level and the other at the feature vector level. For the former, we split the features of each attribute by using a designed attribute-specific attention module (ASAM). For the later, the split features for each attribute are learned by using constrained losses. In both modules, the split features are aggregated by using several convolutional or fully connected layers. (2) We propose a Feature Recombination (FR) that conducts a random shuffle based on the split features over a batch of samples to synthesize more training samples, which spans the potential samples’ variability. To the end, we formulate a unified framework, named CAScaded Split-and-Aggregate Learning with Feature Recombination (CAS-SAL-FR), to learn the above modules jointly and concurrently. Experiments on five popular benchmarks, including RAP, PA-100K, PETA, Market-1501 and Duke attribute datasets, show the proposed CAS-SAL-FR achieves new state-of-the-art performance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In ICCV.

  2. Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI.

  3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009) . Imagenet: A large-scale hierarchical image database. In CVPR.

  4. Deng, Y., Luo, P., Loy, C. C., & Tang, X. (2014). Pedestrian attribute recognition at far distance. In ACM MM.

  5. Dixit, M., Kwitt, R., Niethammer, M., & Vasconcelos, N. (2017). Aga: Attribute-guided augmentation. In CVPR.

  6. Fu, C., Wu, X., Hu, Y., Huang, H., & He, R. (2019). Dual variational generation for low-shot heterogeneous face recognition. In NeurIPS.

  7. Fu, J., Zheng, H., & Mei, T. (2017). Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In CVPR.

  8. Gao, L., Huang, D., Guo, Y., & Wang, Y. (2019). Pedestrian attribute recognition via hierarchical multi-task learning and relationship attention. In ACM MM.

  9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS.

  10. Guo, H., Zheng, K., Fan, X., Yu, H., & Wang, S. (2019). Visual attention consistency under image transforms for multi-label image classification. In CVPR.

  11. Han, K., Wang, Y., Shu, H., Liu, C., Xu, C., & Xu, C. (2019). Attribute aware pooling for pedestrian attribute recognition. In IJCAI.

  12. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV.

  13. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  14. Hongyi, Z., Moustapha, C., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. In International conference on learning representations.

  15. Hu, J., Shen, L., & Sun, G. (2018a). Squeeze-and-excitation networks. In CVPR (pp. 7132–7141).

  16. Hu, J., Shen, L., & Sun, G. (2018b) . Squeeze-and-excitation networks. In CVPR.

  17. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).

  18. Ioffe, S., & Szegedy, C. (2015) . Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.

  19. Jia, J., Huang, H., Yang, W., Chen, X., & Huang, K. (2020). Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method. arXiv preprint arXiv:2005.11909

  20. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM MM.

  21. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR.

  22. Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR.

  23. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS.

  24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.

    Article  Google Scholar 

  25. Li, D., Chen, X., & Huang, K. (2015). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR.

  26. Li, D., Chen, X., Zhang, Z., & Huang, K. (2018a). Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In ICME.

  27. Li, D., Zhang, Z., Chen, X., & Huang, K. (2018b). A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. In IEEE TIP.

  28. Li, Q., Zhao, X., He, R., & Huang, K. (2019a). Pedestrian attribute recognition by joint visual-semantic reasoning and knowledge distillation. In IJCAI.

  29. Li, Q., Zhao, X., He, R., & Huang, K. (2019b). Visual-semantic graph reasoning for pedestrian attribute recognition. In AAAI.

  30. Li, W., Zhu, X., & Gong, S. (2018c). Harmonious attention network for person re-identification. In CVPR.

  31. Lim, J. J., Salakhutdinov, R. R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In NeurIPS.

  32. Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Hu, Z., Yan, C., et al. (2019). Improving person re-identification by attribute and identity learning. Pattern Recognition, 95, 151–161.

    Article  Google Scholar 

  33. Liu, B., Wang, X., Dixit, M., Kwitt, R., & Vasconcelos, N. (2018a). Feature space transfer for data augmentation. In CVPR.

  34. Liu, P., Liu, X., Yan, J., & Shao, J.(2018b). Localization guided learning for pedestrian attribute recognition. In BMVC.

  35. Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J., & Wang, X. (2017). Hydraplus-net: Attentive deep features for pedestrian analysis. In ICCV.

  36. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikainen, M. (2020). Deep learning for generic object detection: A survey. In IJCV.

  37. Sarafianos, N., Xu, X., & Kakadiaris, I. A. (2018). Deep imbalanced attribute classification using visual attention aggregation. In ECCV.

  38. Sarfraz, M. S., Schumann, A., Wang, Y., & Stiefelhagen, R. (2017). Deep view-sensitive pedestrian attribute inference in an end-to-end model. In BMVC.

  39. Shifeng, Z., Longyin, W., Shi, H., Lei, Z., Lyu, S., & Li, S. Z. (2019). Single-shot scale-aware network for real-time face detection. In IJCV.

  40. Shuzhe, W., Meina, K., Shan, S., & Chen, X. (2019). Hierarchical attention for part-aware face detection. In IJCV.

  41. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

  42. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31).

  43. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).

  44. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.

  45. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

  46. Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2018). Efficient group-n encoding and decoding for facial age estimation. In IEEE TPAMI.

  47. Tan, Z., Yang, Y., Wan, J., Guo, G., & Li, S. Z. (2019a). Deeply-learned hybrid representations for facial age estimation. In IJCAI (pp. 3548–3554).

  48. Tan, Z., Yang, Y., Wan, J., Hang, H., Guo, G., & Li, S. Z. (2019b). Attention-based pedestrian attribute analysis. IEEE TIP, 28(12), 6126–6140.

    MathSciNet  MATH  Google Scholar 

  49. Tang, C., Sheng, L., Zhang, Z., & Hu, X. (2019c). Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In ICCV.

  50. Wang, J., Zhu, X., Gong, S., & Li, W. (2017). Attribute recognition by joint recurrent learning of context and correlation. In ICCV.

  51. Wang, Y., Gan, W., Wu, W., & Yan, J. (2019). Dynamic curriculum learning for imbalanced data classification. In ICCV.

  52. Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In ECCV.

  53. Wu, M., Huang, D., Guo, Y., & Wang, Y. (2020). Distraction-aware feature learning for human attribute recognition via coarse-to-fine attention mechanism. In AAAI.

  54. Xiang, L., Jin, X., Ding, G., Han, J., & Li, L. (2019). Incremental few-shot learning for pedestrian attribute recognition. In IJCAI.

  55. Xiangyu, Z., Hao, L., & Gong, S. (2020). Scalable person re-identification by harmonious attention. In IJCV.

  56. Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In ICLR.

  57. Zeng, H., Ai, H., Zhuang, Z., & Chen, L. (2020). Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In ICME.

  58. Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., Manmatha, R., & Li, M. (2020). Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955

  59. Zhang, J., Ren, P., & Li, J. (2020) . Deep template matching for pedestrian attribute recognition with the auxiliary supervision of attribute-wise keypoints. arXiv preprint arXiv:2011.06798

  60. Zhao, X., Sang, L., Ding, G., Guo, Y., & Jin, X. (2018). Grouping attribute recognition for pedestrian with joint recurrent learning. In IJCAI.

  61. Zhao, X., Sang, L., Ding, G., Han, J., Di, N., & Yan, C. (2019). Recurrent attention model for pedestrian attribute recognition. In: AAAI.

  62. Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.

  63. Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence.

  64. Zhu, J., Liao, S., Lei, Z., Yi, D., & Li, S. Z. (2013). Pedestrian attribute classification in surveillance: Database and evaluation. In ICCVW.

  65. Zhu, X., Liu, H., Lei, Z., Shi, H., Yang, F., Yi, D., Qi, G., & Li, S. Z. (2019). Large-scale bisample learning on id versus spot face recognition. In IJCV.

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program (No. 2020YFC2003901), the Chinese National Natural Science Foundation Projects #61806203, #61961160704, #61876179, the External cooperation key project of Chinese Academy Sciences #173211KYSB20200002, the Key Project of the General Logistics Department Grant No. AWS17J001, Science and Technology Development Fund of Macau (No. 0010/2019/AFJ, 0025/2019/AKP, 0019/2018/ASC), the Spanish project TIN2016-74946-P (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya. This work was also supported in part by the Academy of Finland (Grants 336033, 315896), Business Finland (Grant 884/31/2018), and EU H2020 (Grant 101016775).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jun Wan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Dima Damen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Tan, Z., Tiwari, P. et al. Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition. Int J Comput Vis (2021). https://doi.org/10.1007/s11263-021-01499-z

Download citation

Keywords

  • Pedestrian attribute recognition
  • Attention
  • Split-and-aggregate learning
  • Feature recombination