Attention-Based Network for Cross-View Gait Recognition

  • Yuanyuan Huang
  • Jianfu Zhang
  • Haohua Zhao
  • Liqing ZhangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11307)


Existing gait recognition approaches based on CNN (Convolutional Neural Network) extract features from different human parts indiscriminately, without consideration of spatial heterogeneity. This may cause a loss of discriminative information for gait recognition, since different human parts vary in shape, movement constraints and so on. In this work, we devise an attention-based embedding network to address this problem. The attention module incorporated in our network assigns different saliency weights to different parts in feature maps at pixel level. The embedding network strives to embed gait features into low-dimensional latent space such that similarities can be simply measured by Euclidian distance. To achieve this goal, a combination of contrastive loss and triplet loss is utilized for training. Experiments demonstrate that our proposed network prevails over the state-of-the-art works on both OULP and MVLP dataset under cross-view conditions. Notably, we achieve 6.4\(\%\) rank-1 recognition accuracy improvement under 90\(^{\circ }\) angular difference on MVLP and 3.6\(\%\) under 30\(^{\circ }\) angular difference on OULP.


Gait recognition Attention mechanism Embedding learning 



The work was supported by the Key Basic Research Program of Shanghai Municipality, China (15JC1400103, 16JC1402800) and the National Basic Research Program of China (Grant No. 2015CB856004).


  1. 1.
    Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, pp. 161–168 (2008)Google Scholar
  2. 2.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)Google Scholar
  3. 3.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  4. 4.
    Iwama, H., Okumura, M., Makihara, Y., Yagi, Y.: The ou-isir gait database comprising the large population dataset and performance evaluation of gait recognition. IEEE Trans. Inf. Forensics Secur. 7(5), 1511–1521 (2012)CrossRefGoogle Scholar
  5. 5.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)Google Scholar
  6. 6.
    Jean, F., Bergevin, R., Albu, A.B.: Computing and evaluating view-normalized body part trajectories. Image Vis. Comput. 27(9), 1272–1284 (2009)CrossRefGoogle Scholar
  7. 7.
    Kale, A., Chowdhury, A.R., Chellappa, R.: Towards a view invariant gait recognition algorithm. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003, pp. 143–150. IEEE (2003)Google Scholar
  8. 8.
    Kusakunniran, W., Wu, Q., Zhang, J., Li, H.: Support vector regression for multi-view gait recognition based on local motion feature selection. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 974–981. IEEE (2010)Google Scholar
  9. 9.
    Li, C., Min, X., Sun, S., Lin, W., Tang, Z.: DeepGait: a learning deep convolutional representation for view-invariant gait recognition using joint Bayesian. Appl. Sci. 7(3), 210 (2017)CrossRefGoogle Scholar
  10. 10.
    Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. arXiv preprint arXiv:1802.08122 (2018)
  11. 11.
    Makihara, Y., Sagawa, R., Mukaigawa, Y., Echigo, T., Yagi, Y.: Gait recognition using a view transformation model in the frequency domain. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 151–163. Springer, Heidelberg (2006). Scholar
  12. 12.
    Makihara, Y., Suzuki, A., Muramatsu, D., Li, X., Yagi, Y.: Joint intensity and spatial metric learning for robust gait recognition. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp. 5705–5715 (2017)Google Scholar
  13. 13.
    Man, J., Bhanu, B.: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 316–322 (2006)CrossRefGoogle Scholar
  14. 14.
    Muramatsu, D., Makihara, Y., Yagi, Y.: Cross-view gait recognition by fusion of multiple transformation consistency measures. IET Biom. 4(2), 62–73 (2015)CrossRefGoogle Scholar
  15. 15.
    Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans. Comput. Vis. Appl. 10(4), 1–14 (2018)Google Scholar
  16. 16.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)Google Scholar
  17. 17.
    Shiraga, K., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: Geinet: view-invariant gait recognition using a convolutional neural network. In: 2016 International Conference on Biometrics (ICB), pp. 1–8. IEEE (2016)Google Scholar
  18. 18.
    Takemura, N., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y.: On input/output architectures for convolutional neural network-based cross-view gait recognition. IEEE Trans. Circuits Syst. Video Technol. (2017)Google Scholar
  19. 19.
    Wang, F., et al.: Residual attention network for image classification. arXiv preprint arXiv:1704.06904 (2017)
  20. 20.
    Wu, Z., Huang, Y., Wang, L., Wang, X., Tan, T.: A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 39(2), 209–226 (2017)Google Scholar
  21. 21.
    Yu, S., Chen, H., Reyes, E.B.G., Norman, P.: Gaitgan: invariant gait feature extraction using generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 30–37 (2017)Google Scholar
  22. 22.
    Yu, S., Chen, H., Wang, Q., Shen, L., Huang, Y.: Invariant feature extraction for gait recognition using only one uniform model. Neurocomputing 239, 81–93 (2017)CrossRefGoogle Scholar
  23. 23.
    Zhang, C., Liu, W., Ma, H., Fu, H.: Siamese neural network based gait recognition for human identification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2832–2836. IEEE (2016)Google Scholar
  24. 24.
    Zhang, J., Wang, N., Zhang, L.: Multi-shot pedestrian re-identification via sequential decision making. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yuanyuan Huang
    • 1
  • Jianfu Zhang
    • 1
  • Haohua Zhao
    • 1
  • Liqing Zhang
    • 1
    Email author
  1. 1.Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations