Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discriminative part features. Existing VI-ReID methods instead tend to learn global representations, which have limited discriminability and weak robustness to noisy images. In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID. We propose an intra-modality weighted-part attention module to extract discriminative part-aggregated features, by imposing the domain knowledge on the part relationship mining. To enhance robustness against noisy samples, we introduce cross-modality graph structured attention to reinforce the representation with the contextual relations across the two modalities. We also develop a parameter-free dynamic dual aggregation learning strategy to adaptively integrate the two components in a progressive joint training manner. Extensive experiments demonstrate that DDAG outperforms the state-of-the-art methods under various settings.


Person re-identification Graph attention Cross-modality 


  1. 1.
    Bai, S., Tang, P., Torr, P.H., Latecki, L.J.: Re-ranking via metric fusion for object retrieval and person re-identification. In: CVPR, pp. 740–749 (2019)Google Scholar
  2. 2.
    Basaran, E., Gokmen, M., Kamasak, M.E.: An efficient framework for visible-infrared cross modality person re-identification. arXiv preprint arXiv:1907.06498 (2019)
  3. 3.
    Cao, J., Pang, Y., Han, J., Li, X.: Hierarchical shot detector. In: ICCV, pp. 9705–9714 (2019)Google Scholar
  4. 4.
    Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: ICCV, pp. 371–381 (2019)Google Scholar
  5. 5.
    Chen, D., et al.: Improving deep visual representation for person re-identification by global and local image-language association. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 56–73. Springer, Cham (2018). Scholar
  6. 6.
    Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: ICML, pp. 793–802 (2018)Google Scholar
  7. 7.
    Dai, P., Ji, R., Wang, H., Wu, Q., Huang, Y.: Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp. 677–683 (2018)Google Scholar
  8. 8.
    Fang, P., Zhou, J., Roy, S.K., Petersson, L., Harandi, M.: Bilinear attention networks for person retrieval. In: ICCV, pp. 8030–8039 (2019)Google Scholar
  9. 9.
    Feng, Z., Lai, J., Xie, X.: Learning modality-specific representations for visible-infrared person re-identification. IEEE TIP 29, 579–590 (2020)MathSciNetGoogle Scholar
  10. 10.
    Gong, Y., Zhang, Y., Poellabauer, C., et al.: Second-order non-local attention networks for person re-identification. In: ICCV, pp. 3760–3769 (2019)Google Scholar
  11. 11.
    Hao, Y., Wang, N., Li, J., Gao, X.: HSME: hypersphere manifold embedding for visible thermal person re-identification. In: AAAI, pp. 8385–8392 (2019)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  13. 13.
    He, R., Wu, X., Sun, Z., Tan, T.: Learning invariant deep representation for NIR-VIS face recognition. In: AAAI, pp. 2000–2006 (2017)Google Scholar
  14. 14.
    Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: CVPR, pp. 9317–9326 (2019)Google Scholar
  15. 15.
    Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: VRSTC: occlusion-free video person re-identification. In: CVPR, pp. 7183–7192 (2019)Google Scholar
  16. 16.
    Huang, D.A., Frank Wang, Y.C.: Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: ICCV, pp. 2496–2503 (2013)Google Scholar
  17. 17.
    Jingya, W., Xiatian, Z., Shaogang, G., Wei, L.: Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: CVPR, pp. 2275–2284 (2018)Google Scholar
  18. 18.
    Leng, Q., Ye, M., Tian, Q.: A survey of open-world person re-identification. IEEE TCSVT 30(4), 1092–1108 (2019)Google Scholar
  19. 19.
    Li, D., Wei, X., Hong, X., Gong, Y.: Infrared-visible cross-modal person re-identification with an X modality. In: AAAI, pp. 4610–4617 (2020)Google Scholar
  20. 20.
    Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR, pp. 369–378 (2018)Google Scholar
  21. 21.
    Li, S., Xiao, T., Li, H., Yang, W., Wang, X.: Identity-aware textual-visual matching with latent co-attention. In: ICCV, pp. 1890–1899 (2017)Google Scholar
  22. 22.
    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR, pp. 2285–2294 (2018)Google Scholar
  23. 23.
    Lin, J.W., Li, H.: HPILN: a feature learning framework for cross-modality person re-identification. arXiv preprint arXiv:1906.03142 (2019)
  24. 24.
    Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: BMVC (2019)Google Scholar
  25. 25.
    Liu, H., Cheng, J.: Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification. arXiv preprint arXiv:1907.09659 (2019)
  26. 26.
    Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV, pp. 350–359 (2017)Google Scholar
  27. 27.
    Luo, H., et al.: A strong baseline and batch normalization neck for deep person re-identification. arXiv preprint arXiv:1906.08332 (2019)
  28. 28.
    Mudunuri, S.P., Venkataramanan, S., Biswas, S.: Dictionary alignment with re-ranking for low-resolution NIR-VIS face recognition. IEEE TIFS 14(4), 886–896 (2019)Google Scholar
  29. 29.
    Nguyen, D.T., Hong, H.G., Kim, K.W., Park, K.R.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3), 605 (2017)CrossRefGoogle Scholar
  30. 30.
    Pang, M., Cheung, Y.M., Shi, Q., Li, M.: Iterative dynamic generic learning for face recognition from a contaminated single-sample per person. IEEE TNNLS (2020)Google Scholar
  31. 31.
    Pang, M., Cheung, Y.M., Wang, B., Lou, J.: Synergistic generic learning for face recognition from a contaminated single sample per person. IEEE TIFS 15, 195–209 (2019)Google Scholar
  32. 32.
    Peng, C., Wang, N., Li, J., Gao, X.: Re-ranking high-dimensional deep local representation for NIR-VIS face recognition. IEEE TIP 28, 4553–4565 (2019)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: NeurIPS, pp. 2483–2493 (2018)Google Scholar
  34. 34.
    Sarfraz, M.S., Stiefelhagen, R.: Deep perceptual mapping for cross-modal face recognition. Int. J. Comput. Vision 122(3), 426–438 (2017)CrossRefGoogle Scholar
  35. 35.
    Shao, R., Lan, X., Li, J., Yuen, P.C.: Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In: CVPR, pp. 10023–10031 (2019)Google Scholar
  36. 36.
    Shao, R., Lan, X., Yuen, P.C.: Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE TIFS 14(4), 923–938 (2018)Google Scholar
  37. 37.
    Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification. In: CVPR, pp. 5363–5372 (2018)Google Scholar
  38. 38.
    Song, G., Chai, W.: Collaborative learning for deep neural networks. In: NeurIPS, pp. 1837–1846 (2018)Google Scholar
  39. 39.
    Sun, Y., et al.: Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. In: CVPR, pp. 393–402 (2019)Google Scholar
  40. 40.
    Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). Scholar
  41. 41.
    Tay, C.P., Roy, S., Yap, K.H.: AANet: attribute attention network for person re-identifications. In: CVPR, pp. 7134–7143 (2019)Google Scholar
  42. 42.
    Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)Google Scholar
  43. 43.
    Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)Google Scholar
  44. 44.
    Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., Hou, Z.: RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In: ICCV, pp. 3623–3632 (2019)Google Scholar
  45. 45.
    Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: ACM MM, pp. 274–282. ACM (2018)Google Scholar
  46. 46.
    Wang, N., Gao, X., Sun, L., Li, J.: Bayesian face sketch synthesis. IEEE TIP 26(3), 1264–1274 (2017)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)Google Scholar
  48. 48.
    Wang, Z., Wang, Z., Zheng, Y., Wu, Y., Zeng, W., Satoh, S.: Beyond intra-modality: a survey of heterogeneous person re-identification. In: IJCAI (2020)Google Scholar
  49. 49.
    Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: CVPR, pp. 618–626 (2019)Google Scholar
  50. 50.
    Wu, A., Zheng, W.s., Yu, H.X., Gong, S., Lai, J.: RGB-infrared cross-modality person re-identification. In: ICCV, pp. 5380–5389 (2017)Google Scholar
  51. 51.
    Wu, X., Huang, H., Patel, V.M., He, R., Sun, Z.: Disentangled variational representation for heterogeneous face recognition. In: AAAI, pp. 9005–9012 (2019)Google Scholar
  52. 52.
    Wu, X., Song, L., He, R., Tan, T.: Coupled deep learning for heterogeneous face recognition. In: AAAI, pp. 1679–1686 (2018)Google Scholar
  53. 53.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057 (2015)Google Scholar
  54. 54.
    Yang, W., Huang, H., Zhang, Z., Chen, X., Huang, K., Zhang, S.: Towards rich feature discovery with class activation maps augmentation for person re-identification. In: CVPR, pp. 1389–1398 (2019)Google Scholar
  55. 55.
    Yao, H., Zhang, S., Hong, R., Zhang, Y., Xu, C., Tian, Q.: Deep representation learning with part loss for person re-identification. IEEE TIP 28(6), 2860–2871 (2019)MathSciNetzbMATHGoogle Scholar
  56. 56.
    Ye, M., Lan, X., Leng, Q., Shen, J.: Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. (TIP) 29, 9387–9399 (2020)CrossRefGoogle Scholar
  57. 57.
    Ye, M., Lan, X., Li, J., Yuen, P.C.: Hierarchical discriminative learning for visible thermal person re-identification. In: AAAI, pp. 7501–7508 (2018)Google Scholar
  58. 58.
    Ye, M., Lan, X., Wang, Z., Yuen, P.C.: Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE TIFS 15, 407–419 (2020)Google Scholar
  59. 59.
    Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. arXiv preprint arXiv:2001.04193 (2020)
  60. 60.
    Ye, M., Shen, J., Shao, L.: Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE TIFS 16, 728–739 (2020)Google Scholar
  61. 61.
    Ye, M., Shen, J., Zhang, X., Yuen, P.C., Chang, S.F.: Augmentation invariant and instance spreading feature for softmax embedding. IEEE TPAMI (2020) Google Scholar
  62. 62.
    Zeng, Z., Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Illumination-adaptive person re-identification. IEEE TMM (2020)Google Scholar
  63. 63.
    Zhang, X., Yu, F.X., Karaman, S., Zhang, W., Chang, S.F.: Heated-up softmax embedding. arXiv preprint arXiv:1809.04157 (2018)
  64. 64.
    Zhang, X., et al.: AlignedReID: surpassing human-level performance in person re-identification. arXiv preprint arXiv:1711.08184 (2017)
  65. 65.
    Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. In: ICLR (2019)Google Scholar
  66. 66.
    Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV, pp. 3219–3228 (2017)Google Scholar
  67. 67.
    Zheng, F., et al.: Pyramidal person re-identification via multi-loss dynamic training. In: CVPR, pp. 8514–8522 (2019)Google Scholar
  68. 68.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV, pp. 1116–1124 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Inception Institute of Artificial IntelligenceAbu DhabiUAE
  2. 2.Indiana UniversityBloomingtonUSA
  3. 3.University of RochesterRochesterUSA
  4. 4.Mohamed bin Zayed University of Artificial IntelligenceAbu DhabiUAE

Personalised recommendations