Advertisement

Symbiotic Adversarial Learning for Attribute-Based Person Search

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)

Abstract

Attribute-based person search is in significant demand for applications where no detected query images are available, such as identifying a criminal from witness. However, the task itself is quite challenging because there is a huge modality gap between images and physical descriptions of attributes. Often, there may also be a large number of unseen categories (attribute combinations). The current state-of-the-art methods either focus on learning better cross-modal embeddings by mining only seen data, or they explicitly use generative adversarial networks (GANs) to synthesize unseen features. The former tends to produce poor embeddings due to insufficient data, while the latter does not preserve intra-class compactness during generation. In this paper, we present a symbiotic adversarial learning framework, called SAL. Two GANs sit at the base of the framework in a symbiotic learning scheme: one synthesizes features of unseen classes/categories, while the other optimizes the embedding and performs the cross-modal alignment on the common embedding space. Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process and the interactions between the two mutually benefit each other. Extensive evaluations show SAL’s superiority over nine state-of-the-art methods with two challenging pedestrian benchmarks, PETA and Market-1501. The code is publicly available at: https://github.com/ycao5602/SAL.

Keywords

Person search Cross-modal retrieval Adversarial learning 

Notes

Acknowledgement

This research was supported by ARC FL-170100117, IH-180100002, LE-200100049.

Author Contributions

Yu-Tong Cao, Jingya Wang : Equal contribution.

Supplementary material

504468_1_En_14_MOESM1_ESM.pdf (368 kb)
Supplementary material 1 (pdf 368 KB)

References

  1. 1.
    Andrew, G., Arora, R., Bilmes, J., Livescu, K.: Deep canonical correlation analysis. In: ICML (2013)Google Scholar
  2. 2.
    Barshan, E., Fieguth, P.: Stage-wise training: an improved feature learning strategy for deep models. In: Feature Extraction: Modern Questions and Challenges, pp. 49–59 (2015)Google Scholar
  3. 3.
    Chen, Y.C., Zhu, X., Zheng, W.S., Lai, J.H.: Person re-identification by camera correlation aware feature augmentation. IEEE TPAMI 40(2), 392–408 (2018)CrossRefGoogle Scholar
  4. 4.
    Chongxuan, L., Xu, T., Zhu, J., Zhang, B.: Triple generative adversarial nets. In: NIPS (2017)Google Scholar
  5. 5.
    Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: ACM MM. ACM (2014)Google Scholar
  6. 6.
    Deng, Y., Luo, P., Loy, C.C., Tang, X.: Learning to recognize pedestrian attribute. arXiv:1501.00901 (2015)
  7. 7.
    Dong, Q., Gong, S., Zhu, X.: Person search by text attribute query as zero-shot learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3652–3661 (2019)Google Scholar
  8. 8.
    Eisenschtat, A., Wolf, L.: Linking image and text with 2-way nets. In: CVPR (2017)Google Scholar
  9. 9.
    Felix, R., Kumar, V.B., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11210, pp. 21–37. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_2CrossRefGoogle Scholar
  10. 10.
    Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: ACM MM. ACM (2014)Google Scholar
  11. 11.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  12. 12.
    Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)CrossRefGoogle Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  14. 14.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)
  15. 15.
    Hoffman, J., et al.: CyCADA: cycle-consistent adversarial domain adaptation. arXiv:1711.03213 (2017)
  16. 16.
    Jaha, E.S., Nixon, M.S.: Soft biometrics for subject identification using clothing attributes. In: IJCB. IEEE (2014)Google Scholar
  17. 17.
    Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1857–1865. JMLR. org (2017)
  18. 18.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
  19. 19.
    Kumar Verma, V., Arora, G., Mishra, A., Rai, P.: Generalized zero-shot learning via synthesized examples. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4281–4289 (2018)Google Scholar
  20. 20.
    Layne, R., Hospedales, T.M., Gong, S.: Towards person identification and re-identification with attributes. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) Computer Vision – ECCV 2012, Workshops and Demonstrations. Lecture Notes in Computer Science, vol. 7583, pp. 402–412. Springer, Berlin, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33863-2_40CrossRefGoogle Scholar
  21. 21.
    Layne, R., Hospedales, T.M., Gong, S.: Attributes-based re-identification. In: Gong, S., Cristani, M., Yan, S., Loy, C. (eds.) Person Re-Identification. Advances in Computer Vision and Pattern Recognition, pp. 93–117. Springer, London (2014).  https://doi.org/10.1007/978-1-4471-6296-4_5CrossRefGoogle Scholar
  22. 22.
    Layne, R., Hospedales, T.M., Gong, S., Mary, Q.: Person re-identification by attributes. In: BMVC (2012)Google Scholar
  23. 23.
    Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: IAPR ACPR. IEEE (2015)Google Scholar
  24. 24.
    Li, S., Xiao, T., Li, H., Yang, W., Wang, X.: Identity-aware textual-visual matching with latent co-attention. In: ICCV (2017)Google Scholar
  25. 25.
    Li, W., Zhao, R., Xiao, T., Wang, X.: DeepREFId: deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  26. 26.
    Li, W., Zhu, X., Gong, S.: Person re-identification by deep joint learning of multi-loss classification. arXiv:1705.04724 (2017)
  27. 27.
    Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Yang, Y.: Improving person re-identification by attribute and identity learning. arXiv:1703.07220 (2017)
  28. 28.
    Liu, C., Gong, S., Loy, C.C., Lin, X.: Person re-identification: what features are important? In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) Computer Vision – ECCV 2012, Workshops and Demonstrations. Lecture Notes in Computer Science, vol. 7583, pp. 391–401. Springer, Berlin, Heidelberg. (2012).  https://doi.org/10.1007/978-3-642-33863-2_39CrossRefGoogle Scholar
  29. 29.
    Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV (2017)Google Scholar
  30. 30.
    Long, Y., Liu, L., Shao, L., Shen, F., Ding, G., Han, J.: From zero-shot learning to conventional supervised classification: unseen visual data synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1627–1636 (2017)Google Scholar
  31. 31.
    Paisitkriangkrai, S., Shen, C., Van Den Hengel, A.: Learning to rank in person re-identification with metric ensembles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1846–1855 (2015)Google Scholar
  32. 32.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017)Google Scholar
  33. 33.
    Reid, D.A., Nixon, M.S., Stevenage, S.V.: Soft biometrics; human identification using comparative descriptions. IEEE TPAMI 36(6), 1216–1228 (2014)CrossRefGoogle Scholar
  34. 34.
    Saquib Sarfraz, M., Schumann, A., Eberle, A., Stiefelhagen, R.: A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: CVPR (2018)Google Scholar
  35. 35.
    Scheirer, W.J., Kumar, N., Belhumeur, P.N., Boult, T.E.: Multi-attribute spaces: calibration for attribute fusion and similarity search. In: CVPR. IEEE (2012)Google Scholar
  36. 36.
    Shi, Z., Hospedales, T.M., Xiang, T.: Transferring a semantic representation for person re-identification and search. In: CVPR (2015)Google Scholar
  37. 37.
    Siddiquie, B., Feris, R.S., Davis, L.S.: Image ranking and retrieval based on multi-attribute queries. In: CVPR. IEEE (2011)Google Scholar
  38. 38.
    Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for person re-identification. In: ICCV (2015)Google Scholar
  39. 39.
    Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. Lecture Notes in Computer Science, vol. 9906, pp. 475–491. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_30CrossRefGoogle Scholar
  40. 40.
    Sun, Y., Zheng, L., Deng, W., Wang, S.: SvdNet for pedestrian retrieval. In: ICCV (2017)Google Scholar
  41. 41.
    Tsai, Y.H.H., Huang, L.K., Salakhutdinov, R.: Learning robust visual-semantic embeddings. In: ICCV. IEEE (2017)Google Scholar
  42. 42.
    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: CVPR (2017)Google Scholar
  43. 43.
    Vaquero, D.A., Feris, R.S., Tran, D., Brown, L., Hampapur, A., Turk, M.: Attribute-based people search in surveillance environments. In: WACV. IEEE (2009)Google Scholar
  44. 44.
    Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: ACM MM. ACM (2017)Google Scholar
  45. 45.
    Wang, F., Zuo, W., Lin, L., Zhang, D., Zhang, L.: Joint learning of single-image and cross-image representations for person re-identification. In: CVPR (2016)Google Scholar
  46. 46.
    Wang, J., Zhu, X., Gong, S., Li, W.: Attribute recognition by joint recurrent learning of context and correlation. In: ICCV (2017)Google Scholar
  47. 47.
    Wang, J., Zhu, X., Gong, S., Li, W.: Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: CVPR (2018)Google Scholar
  48. 48.
    Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV (2013)Google Scholar
  49. 49.
    Wang, L., Li, Y., Lazebnik, S.: Learning deep structure-preserving image-text embeddings. In: CVPR (2016)Google Scholar
  50. 50.
    Wang, W., Arora, R., Livescu, K., Bilmes, J.: On deep multi-view representation learning. In: ICML (2015)Google Scholar
  51. 51.
    Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)CrossRefGoogle Scholar
  52. 52.
    Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: CVPR (2015)Google Scholar
  53. 53.
    Yin, Z., et al.: Adversarial attribute-image person re-identification. In: IJCAI (7 2018)Google Scholar
  54. 54.
    Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: CVPR (2016)Google Scholar
  55. 55.
    Zhao, X., Sang, L., Ding, G., Guo, Y., Jin, X.: Grouping attribute recognition for pedestrian with joint recurrent learning. In: IJCAI (2018)Google Scholar
  56. 56.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)Google Scholar
  57. 57.
    Zheng, Z., Zheng, L., Garrett, M., Yang, Y., Shen, Y.D.: Dual-path convolutional image-text embedding with instance loss. arXiv:1711.05535 (2017)
  58. 58.
    Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: CVPR (2017)Google Scholar
  59. 59.
    Zhou, K., Xiang, T.: Torchreid: A library for deep learning person re-identification in PyTorch. arXiv preprint arXiv:1910.10093 (2019)
  60. 60.
    Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Learning generalisable omni-scale representations for person re-identification. arXiv preprint arXiv:1910.06827 (2019)
  61. 61.
    Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: ICCV (2019)Google Scholar
  62. 62.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)Google Scholar
  63. 63.
    Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., Elgammal, A.: A generative adversarial approach for zero-shot learning from noisy texts. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.UBTECH Sydney AI Centre, School of Computer Science, Faculty of EngineeringThe University of SydneyDarlingtonAustralia

Personalised recommendations