Advertisement

READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification

Conference paper
  • 512 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)

Abstract

Person re-identification (re-ID) is the problem of visually identifying a person given a database of identities. In this work, we focus on image-to-video re-ID which compares a single query image to videos in the gallery. The main challenge is the asymmetry association of an image and a video, and overcoming the difference caused by the additional temporal dimension. To this end, we propose an attention-aware discriminator architecture. The attention occurs across different modalities, and even different identities to aggregate useful spatio-temporal information for comparison. The information is effectively fused into a united feature, followed by the final prediction of a similarity score. The performance of the method is shown with image-to-video person re-identification benchmarks (DukeMTMC-VideoReID, and MARS).

Keywords

Image and video understanding Identity retrieval Re-identification Attention 

Supplementary material

504468_1_En_20_MOESM1_ESM.pdf (8 mb)
Supplementary material 1 (pdf 8206 KB)

References

  1. 1.
    Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)Google Scholar
  2. 2.
    Chen, B., Deng, W., Hu, J.: Mixed high-order attention network for person re-identification. In: ICCV (2019)Google Scholar
  3. 3.
    Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR (2017)Google Scholar
  4. 4.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  5. 5.
    Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: ICCV (2019)Google Scholar
  6. 6.
    Fu, Y., Wang, X., Wei, Y., Huang, T.: STA: spatial-temporal attention for large-scale video-based person re-identification. In: AAAI (2019)Google Scholar
  7. 7.
    Gu, X., Ma, B., Chang, H., Shan, S., Chen, X.: Temporal knowledge propagation for image-to-video person re-identification. In: ICCV (2019)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  9. 9.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint (2017)Google Scholar
  10. 10.
    Ho, H.I., Shim, M., Wee, D.: Learning from dances: pose-invariant re-identification for multi-person tracking. In: ICASSP (2020)Google Scholar
  11. 11.
    Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: CVPR (2019)Google Scholar
  12. 12.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  13. 13.
    Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: CVPR (2017)Google Scholar
  14. 14.
    Li, S., Bak, S., Carr, P., Wang, X.: Diversity regularized spatiotemporal attention for video-based person re-identification. In: CVPR (2018)Google Scholar
  15. 15.
    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: CVPR (2018)Google Scholar
  16. 16.
    Li, Y.J., Chen, Y.C., Lin, Y.Y., Du, X., Wang, Y.C.F.: Recover and identify: a generative dual model for cross-resolution person re-identification. In: ICCV (2019)Google Scholar
  17. 17.
    Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015)Google Scholar
  18. 18.
    Liu, C.T., Wu, C.W., Wang, Y.C.F., Chien, S.Y.: Spatially and temporally efficient non-local attention network for video-based person re-identification. In: BMVC (2019)Google Scholar
  19. 19.
    Liu, D., Wen, B., Fan, Y., Loy, C.C., Huang, T.S.: Non-local recurrent network for image restoration. In: NeurIPS (2018)Google Scholar
  20. 20.
    Liu, X., et al.: HydraPlus-Net: attentive deep features for pedestrian analysis. In: ICCV (2017)Google Scholar
  21. 21.
    Liu, Y., Yuan, Z., Zhou, W., Li, H.: Spatial and temporal mutual promotion for video-based person re-identification. In: AAAI (2019)Google Scholar
  22. 22.
    Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: ICCV (2019)Google Scholar
  23. 23.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Shen, Y., Li, H., Yi, S., Chen, D., Wang, X.: Person re-identification with deep similarity-guided graph neural network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 508–526. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_30CrossRefGoogle Scholar
  25. 25.
    Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)Google Scholar
  26. 26.
    Suh, Y., Wang, J., Tang, S., Mei, T., Lee, K.M.: Part-aligned bilinear representations for person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 418–437. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_25CrossRefGoogle Scholar
  27. 27.
    Sun, Y., et al.: Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. In: CVPR (2019)Google Scholar
  28. 28.
    Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01225-0_30CrossRefGoogle Scholar
  29. 29.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  30. 30.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)Google Scholar
  31. 31.
    Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)Google Scholar
  32. 32.
    Wang, G., Lai, J., Xie, X.: P2SNet: can an image match a video for person re-identification in an end-to-end way? TCSVT 28(10), 2777–2787 (2017)Google Scholar
  33. 33.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)Google Scholar
  34. 34.
    Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. In: CVPR (2018)Google Scholar
  35. 35.
    Zhang, D., Wu, W., Cheng, H., Zhang, R., Dong, Z., Cai, Z.: Image-to-video person re-identification with temporally memorized similarity learning. TCSVT 28(10), 2622–2632 (2017)Google Scholar
  36. 36.
    Zhang, Y., Li, X., Zhang, Z.: Learning a key-value memory co-attention matching network for person re-identification. In: AAAI (2019)Google Scholar
  37. 37.
    Zhang, Y., Zhong, Q., Ma, L., Xie, D., Pu, S.: Learning incremental triplet margin for person re-identification. In: AAAI (2019)Google Scholar
  38. 38.
    Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)Google Scholar
  39. 39.
    Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)Google Scholar
  40. 40.
    Zheng, L., et al.: MARS: a video benchmark for large-scale person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 868–884. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_52CrossRefGoogle Scholar
  41. 41.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)Google Scholar
  42. 42.
    Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint (2016)Google Scholar
  43. 43.
    Zhu, X., Jing, X.Y., Wu, F., Wang, Y., Zuo, W., Zheng, W.S.: Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In: AAAI (2017)Google Scholar
  44. 44.
    Zhu, X., Jing, X.Y., You, X., Zuo, W., Shan, S., Zheng, W.S.: Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. TIFS 13(3), 717–732 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SeoulSouth Korea
  2. 2.Department of Computer ScienceETH ZürichZürichSwitzerland
  3. 3.School of Electrical EngineeringKAISTDaejeonSouth Korea
  4. 4.Clova AI, NAVER CorporationSeongnamSouth Korea

Personalised recommendations