Skip to main content
Log in

Learning rich features for gait recognition by integrating skeletons and silhouettes

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Gait recognition captures gait patterns from the walking sequence of an individual for identification. Most existing gait recognition methods learn features from silhouettes or skeletons for the robustness to clothing, carrying, and other exterior factors. The combination of the two data modalities, however, is not fully exploited. Previous multimodal gait recognition methods mainly employ the skeleton to assist the local feature extraction where the intrinsic discrimination of the skeleton data is ignored. To fill this gap and make full use of the two complementary data modalities, this paper proposes a simple yet effective Bimodal Fusion (BiFusion) network which mines discriminative gait patterns in skeletons and integrates with silhouette representations to learn rich features for better identification. Particularly, the inherent hierarchical semantics of body joints in a skeleton is leveraged to design a novel Multi-Scale Gait Graph (MSGG) network for the feature extraction of skeletons. Extensive experiments on CASIA-B and OUMVLP demonstrate both the superiority of the proposed MSGG network in modeling skeletons and the effectiveness of the bimodal fusion for gait recognition. Under the most challenging condition of cross-clothing gait recognition on CASIA-B, our method achieves the rank-1 accuracy of 94.0%, which outperforms previous state-of-the-art methods by a large margin. The code is released at https://github.com/YunjiePeng/BimodalFusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available on request from the Institute of Automation, Chinese Academy of Sciences (CASIA) (http://www.cbsr.ia.ac.cn/english/Gait%20Databases.asp) and the Institute of Scientific and Industrial Research (ISIR), Osaka University (OU) (http://www.am.sanken.osaka-u.ac.jp/BiometricDB/GaitMVLP.html).

Notes

  1. The keypoints matrix is a 3D matrix that organizes the skeleton sequence data into regular grid formats. Each keypoint in a skeleton contains three initial features, i.e., the x, y coordinates of the keypoint in the frame and the confidence of the prediction.

  2. Positive: right elbow, right knee, left elbow, left knee, right wrist, right ankle, left wrist, and left ankle. Negative: right shoulder, right hip, left shoulder, left hip. Positive and Negative nodes of the limbs spatial-temporal graph and the bodyparts spatial-temporal graph are similarly defined.

References

  1. Aggarwal H, Vishwakarma D K (2018) Covariate conscious approach for gait recognition based upon zernike moment invariants. IEEE Trans Cogn Develop Syst 10(2):397–407. https://doi.org/10.1109/tcds.2017.2658674

    Article  Google Scholar 

  2. An W, Yu S, Makihara Y et al (2020) Performance evaluation of model-based gait on multi-view very large population database with pose sequences. IEEE Trans Biometr Behav Ident Sci 2(4):421–430. https://doi.org/10.1109/tbiom.2020.3008862

    Article  Google Scholar 

  3. Bodla N, Zheng J, Xu H et al (2017) Deep heterogeneous feature fusion for template-based face recognition. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 586–595

  4. Bouchrika I, Goffredo M, Carter J et al (2011) On using gait in forensic biometrics. J Forens Sci 56(4):882–889. https://doi.org/10.1111/j.1556-4029.2011.01793.x

    Article  Google Scholar 

  5. Boulgouris N V, Huang X (2013) Gait recognition using hmms and dual discriminative observations for sub-dynamics analysis. IEEE Trans Image Process 22(9):3636–3647. https://doi.org/10.1109/tip.2013.2266578

    Article  Google Scholar 

  6. Cai C, Zhou Y, Wang Y (2019) Chd: consecutive horizontal dropout for human gait feature extraction. In: Proceedings of the 2019 8th international conference on computing and pattern recognition, pp 89–94

  7. Cao Z, Simon T, Wei S-E et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310

  8. Chai T, Li A, Zhang S et al (2022) Lagrange motion analysis and view embeddings for improved gait recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 20249–20258

  9. Chao H, He Y, Zhang J et al (2019) Gaitset: regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8126–8133

  10. Chao H, Wang K, He Y et al (2022) Gaitset: cross-view gait recognition through utilizing gait as a deep set. IEEE Trans Pattern Anal Mach Intell 44(7):3467–3478. https://doi.org/10.1109/TPAMI.2021.3057879

    Article  Google Scholar 

  11. Chen C, Ramanan D (2017) 3d human pose estimation = 2d pose estimation + matching. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5759–5767. https://doi.org/10.1109/cvpr.2017.610

  12. Chen X, Luo X, Weng J et al (2021) Multi-view gait image generation for cross-view gait recognition. IEEE Trans Image Process 30:3041–3055. https://doi.org/10.1109/tip.2021.3055936

    Article  Google Scholar 

  13. Deng M, Wang C (2018) Human gait recognition based on deterministic learning and data stream of microsoft kinect. IEEE Trans Circuits Syst Video Technol 29(12):3636–3645. https://doi.org/10.1109/tcsvt.2018.2883449

    Article  Google Scholar 

  14. Dhiman C, Vishwakarma D K (2020) View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Trans Image Process 29:3835–3844. https://doi.org/10.1109/tip.2020.2965299

    Article  Google Scholar 

  15. Dhiman C, Vishwakarma D K, Agarwal P (2021) Part-wise spatio-temporal attention driven cnn-based 3d human action recognition. ACM Trans Multimid Comput Commun Applic 17(3):1–24. https://doi.org/10.1145/3441628

    Article  Google Scholar 

  16. Ding X, Wang K, Wang C et al (2021) Sequential convolutional network for behavioral pattern extraction in gait recognition. Neurocomputing 463:411–421. https://doi.org/10.1016/j.neucom.2021.08.054

    Article  Google Scholar 

  17. Fan C, Peng Y, Cao C et al (2020) Gaitpart: temporal part-based model for gait recognition. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 14225–14233. https://doi.org/10.1109/cvpr42600.2020.01423

  18. Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343. https://doi.org/10.1109/iccv.2017.256

  19. Faundez-Zanuy M (2005) Data fusion in biometrics. IEEE Aerosp Electron Syst Mag 20(1):34–38. https://doi.org/10.1109/maes.2005.1396793

    Article  Google Scholar 

  20. Gallego G, Delbruck T, Orchard G M et al (2020) Event-based vision: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2020.3008413

  21. Han J, Bhanu B (2005) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322. https://doi.org/10.1109/tpami.2006.38

    Article  Google Scholar 

  22. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. 1703.07737

  23. Hou S, Cao C, Liu X et al (2020) Gait lateral network: learning discriminative and compact representations for gait recognition. In: European conference on computer vision. Springer, pp 382–398. https://doi.org/10.1007/978-3-030-58545-7_22

  24. Huang X, Zhu D, Wang H et al (2021) Context-sensitive temporal feature learning for gait recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 12909–12918

  25. Iwama H, Okumura M, Makihara Y et al (2012) The ou-isir gait database comprising the large population dataset and performance evaluation of gait recognition. IEEE Trans Inf Forensics Secur 7(5):1511–1521. https://doi.org/10.1109/tifs.2012.2204253

    Article  Google Scholar 

  26. Larsen P K, Simonsen E B, Lynnerup N (2008) Gait analysis in forensic medicine. J Forensic Sci 53(5):1149–1153. https://doi.org/10.1111/j.1556-4029.2008.00807.x

    Article  Google Scholar 

  27. Li X, Makihara Y, Xu C, Yagi Y, Yu S, Ren M (2020) End-to-end model-based gait recognition. In: Proceedings of the Asian conference on computer vision, pp 3–20

  28. Liang J, Fan C, Hou S, Shen C, Huang Y, Yu S (2022) Gaitedge: beyond plain end-to-end gait recognition for better practicality. arXiv:http://arxiv.org/abs/2203.03972

  29. Liao R, Yu S, An W, Huang Y (2020) A model-based gait recognition method with body pose and human prior knowledge. Pattern Recogn 98:107069. https://doi.org/10.1016/j.patcog.2019.107069

    Article  Google Scholar 

  30. Lin B, Zhang S, Bao F (2020) Gait recognition with multiple-temporal-scale 3d convolutional neural network. In: Proceedings of the 28th ACM international conference on multimedia, pp 3054–3062. https://doi.org/10.1145/3394171.3413861

  31. Lin B, Zhang S, Yu X (2021) Gait recognition via effective global-local feature representation and local temporal aggregation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14648–14656. https://doi.org/10.1109/iccv48922.2021.01438

  32. Lishani A O, Boubchir L, Khalifa E, Bouridane A (2019) Human gait recognition using gei-based local multi-scale feature descriptors. Multimed Tools Applic 78(5):5715–5730. https://doi.org/10.1007/s11042-018-5752-8

    Article  Google Scholar 

  33. Liu J, Zha Z-J, Wu W et al (2021) Spatial-temporal correlation and topology learning for person re-identification in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4370–4379. https://doi.org/10.1109/cvpr46437.2021.00435

  34. Liu X, You Z, He Y et al (2022) Symmetry-driven hyper feature gcn for skeleton-based gait recognition. Pattern Recogn 125:108520. https://doi.org/10.1016/j.patcog.2022.108520

    Article  Google Scholar 

  35. Makihara Y, Mannami H, Tsuji A et al (2012) The ou-isir gait database comprising the treadmill dataset. IPSJ Trans Comput Vis Applic 4:53–62. https://doi.org/10.2197/ipsjtcva.4.53

    Article  Google Scholar 

  36. Maltoni D, Maio D, Jain A et al (2005) Handbook of fingerprint recognition. Ch Synthetic Fingerprint Generation 33(5–6):1314

    Google Scholar 

  37. Mao M, Song Y (2020) Gait recognition based on 3d skeleton data and graph convolutional network. In: 2020 IEEE International joint conference on biometrics (IJCB). https://doi.org/10.1109/ijcb48548.2020.9304916

  38. Marín-Jimínez M J, Castro F M, Delgado-Escaño R et al (2021) Ugaitnet: multimodal gait recognition with missing input modalities. IEEE Trans Inf Forensics Secur 16:5452–5462. https://doi.org/10.1109/TIFS.2021.3132579

    Article  Google Scholar 

  39. Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp 8026–8037

  40. Ross A A, Govindarajan R (2005) Feature level fusion of hand and face biometrics. In: Biometric technology for human identification II, vol 5779. International Society for Optics and Photonics, pp 196–204. https://doi.org/10.1117/12.606093

  41. Shekhar S, Patel V M, Nasrabadi N M et al (2014) Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell 36(1):113–126. https://doi.org/10.1109/tpami.2013.109

    Article  Google Scholar 

  42. Singh T, Vishwakarma D K (2021) A deep multimodal network based on bottleneck layer features fusion for action recognition. Multimed Tools Applic 80 (24):33505–33525. https://doi.org/10.1007/s11042-021-11415-9

    Article  Google Scholar 

  43. Singh T, Vishwakarma D K (2021) A deeply coupled convnet for human activity recognition using dynamic and rgb images. Neural Comput Appl 33(1):469–485. https://doi.org/10.1007/s00521-020-05018-y

    Article  Google Scholar 

  44. Sun J, Wang Y, Li J et al (2018) View-invariant gait recognition based on kinect skeleton feature. Multimed Tools Applic 77(19):24909–24935. https://doi.org/10.1007/s11042-018-5722-1

    Article  Google Scholar 

  45. Sun Y, Chen Y, Wang X et al (2014) Deep learning face representation by joint identification-verification. Advances in Neural Information Processing Systems, 27

  46. Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00584

  47. Takemura N, Makihara Y, Muramatsu D et al (2018) Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans Comput Vis Applic 10(1):1–14. https://doi.org/10.1186/s41074-018-0039-6

    Article  Google Scholar 

  48. Teepe T, Khan A, Gilg J et al (2021) Gaitgraph: graph convolutional network for skeleton-based gait recognition. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2314–2318. https://doi.org/10.1109/icip42928.2021.9506717

  49. Tong S, Fu Y, Yue X et al (2018) Multi-view gait recognition based on a spatial-temporal deep neural network. IEEE Access 6:57583–57596. https://doi.org/10.1109/access.2018.2874073

    Article  Google Scholar 

  50. Wang Y, Zhang X, Shen Y et al (2021) Event-stream representation for human gaits identification using deep neural networks. IEEE Trans Pattern Anal Mach Intell, https://doi.org/10.1109/tpami.2021.3054886

  51. Wildes, R.P. (1997) Iris recognition: an emerging biometric technology. Proc IEEE 85(9):1348–1363. https://doi.org/10.1109/5.628669

    Article  Google Scholar 

  52. Wu Z, Huang Y, Wang L et al (2017) A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans Pattern Anal Mach Intell 39(02):209–226. https://doi.org/10.1109/tpami.2016.2545669

    Article  Google Scholar 

  53. Xin Y, Kong L, Liu Z et al (2018) Multimodal feature-level fusion for biometrics identification system on iomt platform. IEEE Access, 1–1. https://doi.org/10.1109/access.2018.2815540

  54. Xu C, Makihara Y, Li X et al (2019) Speed-invariant gait recognition using single-support gait energy image. Multimed Tools Applic 78 (18):26509–26536. https://doi.org/10.1007/s11042-019-7712-3

    Article  Google Scholar 

  55. Xu H, Li Y, Sun X et al (2020) Joint metric learning and hierarchical network for gait recognition. IEEE Access 8:228088–228098. https://doi.org/10.1109/ACCESS.2020.3044580

    Article  Google Scholar 

  56. Xu K, Jiang X, Sun T (2021) Gait identification based on human skeleton with pairwise graph convolutional network. In: 2021 IEEE International conference on multimedia and expo (ICME). IEEE, pp 1–6. https://doi.org/10.1109/icme51207.2021.9428123

  57. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence

  58. Yao L, Kusakunniran W, Wu Q et al (2021) Collaborative feature learning for gait recognition under cloth changes. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/tcsvt.2021.3112564

  59. Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International conference on pattern recognition (ICPR 2006), 20-24 August 2006, Hong Kong, China. https://doi.org/10.1109/icpr.2006.67

  60. Yu S, Chen H, Wang Q et al (2017) Invariant feature extraction for gait recognition using only one uniform model. Neurocomputing 239:81–93. https://doi.org/10.1016/j.neucom.2017.02.006

    Article  Google Scholar 

  61. Zhang Z, Tran L, Yin X et al (2019) Gait recognition via disentangled representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4710–4719. https://doi.org/10.1109/cvpr.2019.00484

  62. Zhang Z, Tran L, Liu F et al (2020) On learning disentangled representations for gait recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2020.2998790

  63. Zheng L, Yang Y, Hauptmann A G (2016) Person re-identification: past, present and future. arXiv:http://arxiv.org/abs/1610.02984

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiqiang He.

Ethics declarations

Conflict of Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, Y., Ma, K., Zhang, Y. et al. Learning rich features for gait recognition by integrating skeletons and silhouettes. Multimed Tools Appl 83, 7273–7294 (2024). https://doi.org/10.1007/s11042-023-15483-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15483-x

Keywords

Navigation