Abstract
Gait recognition captures gait patterns from the walking sequence of an individual for identification. Most existing gait recognition methods learn features from silhouettes or skeletons for the robustness to clothing, carrying, and other exterior factors. The combination of the two data modalities, however, is not fully exploited. Previous multimodal gait recognition methods mainly employ the skeleton to assist the local feature extraction where the intrinsic discrimination of the skeleton data is ignored. To fill this gap and make full use of the two complementary data modalities, this paper proposes a simple yet effective Bimodal Fusion (BiFusion) network which mines discriminative gait patterns in skeletons and integrates with silhouette representations to learn rich features for better identification. Particularly, the inherent hierarchical semantics of body joints in a skeleton is leveraged to design a novel Multi-Scale Gait Graph (MSGG) network for the feature extraction of skeletons. Extensive experiments on CASIA-B and OUMVLP demonstrate both the superiority of the proposed MSGG network in modeling skeletons and the effectiveness of the bimodal fusion for gait recognition. Under the most challenging condition of cross-clothing gait recognition on CASIA-B, our method achieves the rank-1 accuracy of 94.0%, which outperforms previous state-of-the-art methods by a large margin. The code is released at https://github.com/YunjiePeng/BimodalFusion.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available on request from the Institute of Automation, Chinese Academy of Sciences (CASIA) (http://www.cbsr.ia.ac.cn/english/Gait%20Databases.asp) and the Institute of Scientific and Industrial Research (ISIR), Osaka University (OU) (http://www.am.sanken.osaka-u.ac.jp/BiometricDB/GaitMVLP.html).
Notes
The keypoints matrix is a 3D matrix that organizes the skeleton sequence data into regular grid formats. Each keypoint in a skeleton contains three initial features, i.e., the x, y coordinates of the keypoint in the frame and the confidence of the prediction.
Positive: right elbow, right knee, left elbow, left knee, right wrist, right ankle, left wrist, and left ankle. Negative: right shoulder, right hip, left shoulder, left hip. Positive and Negative nodes of the limbs spatial-temporal graph and the bodyparts spatial-temporal graph are similarly defined.
References
Aggarwal H, Vishwakarma D K (2018) Covariate conscious approach for gait recognition based upon zernike moment invariants. IEEE Trans Cogn Develop Syst 10(2):397–407. https://doi.org/10.1109/tcds.2017.2658674
An W, Yu S, Makihara Y et al (2020) Performance evaluation of model-based gait on multi-view very large population database with pose sequences. IEEE Trans Biometr Behav Ident Sci 2(4):421–430. https://doi.org/10.1109/tbiom.2020.3008862
Bodla N, Zheng J, Xu H et al (2017) Deep heterogeneous feature fusion for template-based face recognition. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 586–595
Bouchrika I, Goffredo M, Carter J et al (2011) On using gait in forensic biometrics. J Forens Sci 56(4):882–889. https://doi.org/10.1111/j.1556-4029.2011.01793.x
Boulgouris N V, Huang X (2013) Gait recognition using hmms and dual discriminative observations for sub-dynamics analysis. IEEE Trans Image Process 22(9):3636–3647. https://doi.org/10.1109/tip.2013.2266578
Cai C, Zhou Y, Wang Y (2019) Chd: consecutive horizontal dropout for human gait feature extraction. In: Proceedings of the 2019 8th international conference on computing and pattern recognition, pp 89–94
Cao Z, Simon T, Wei S-E et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 1302–1310
Chai T, Li A, Zhang S et al (2022) Lagrange motion analysis and view embeddings for improved gait recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 20249–20258
Chao H, He Y, Zhang J et al (2019) Gaitset: regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8126–8133
Chao H, Wang K, He Y et al (2022) Gaitset: cross-view gait recognition through utilizing gait as a deep set. IEEE Trans Pattern Anal Mach Intell 44(7):3467–3478. https://doi.org/10.1109/TPAMI.2021.3057879
Chen C, Ramanan D (2017) 3d human pose estimation = 2d pose estimation + matching. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, pp 5759–5767. https://doi.org/10.1109/cvpr.2017.610
Chen X, Luo X, Weng J et al (2021) Multi-view gait image generation for cross-view gait recognition. IEEE Trans Image Process 30:3041–3055. https://doi.org/10.1109/tip.2021.3055936
Deng M, Wang C (2018) Human gait recognition based on deterministic learning and data stream of microsoft kinect. IEEE Trans Circuits Syst Video Technol 29(12):3636–3645. https://doi.org/10.1109/tcsvt.2018.2883449
Dhiman C, Vishwakarma D K (2020) View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Trans Image Process 29:3835–3844. https://doi.org/10.1109/tip.2020.2965299
Dhiman C, Vishwakarma D K, Agarwal P (2021) Part-wise spatio-temporal attention driven cnn-based 3d human action recognition. ACM Trans Multimid Comput Commun Applic 17(3):1–24. https://doi.org/10.1145/3441628
Ding X, Wang K, Wang C et al (2021) Sequential convolutional network for behavioral pattern extraction in gait recognition. Neurocomputing 463:411–421. https://doi.org/10.1016/j.neucom.2021.08.054
Fan C, Peng Y, Cao C et al (2020) Gaitpart: temporal part-based model for gait recognition. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 14225–14233. https://doi.org/10.1109/cvpr42600.2020.01423
Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343. https://doi.org/10.1109/iccv.2017.256
Faundez-Zanuy M (2005) Data fusion in biometrics. IEEE Aerosp Electron Syst Mag 20(1):34–38. https://doi.org/10.1109/maes.2005.1396793
Gallego G, Delbruck T, Orchard G M et al (2020) Event-based vision: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2020.3008413
Han J, Bhanu B (2005) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322. https://doi.org/10.1109/tpami.2006.38
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. 1703.07737
Hou S, Cao C, Liu X et al (2020) Gait lateral network: learning discriminative and compact representations for gait recognition. In: European conference on computer vision. Springer, pp 382–398. https://doi.org/10.1007/978-3-030-58545-7_22
Huang X, Zhu D, Wang H et al (2021) Context-sensitive temporal feature learning for gait recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 12909–12918
Iwama H, Okumura M, Makihara Y et al (2012) The ou-isir gait database comprising the large population dataset and performance evaluation of gait recognition. IEEE Trans Inf Forensics Secur 7(5):1511–1521. https://doi.org/10.1109/tifs.2012.2204253
Larsen P K, Simonsen E B, Lynnerup N (2008) Gait analysis in forensic medicine. J Forensic Sci 53(5):1149–1153. https://doi.org/10.1111/j.1556-4029.2008.00807.x
Li X, Makihara Y, Xu C, Yagi Y, Yu S, Ren M (2020) End-to-end model-based gait recognition. In: Proceedings of the Asian conference on computer vision, pp 3–20
Liang J, Fan C, Hou S, Shen C, Huang Y, Yu S (2022) Gaitedge: beyond plain end-to-end gait recognition for better practicality. arXiv:http://arxiv.org/abs/2203.03972
Liao R, Yu S, An W, Huang Y (2020) A model-based gait recognition method with body pose and human prior knowledge. Pattern Recogn 98:107069. https://doi.org/10.1016/j.patcog.2019.107069
Lin B, Zhang S, Bao F (2020) Gait recognition with multiple-temporal-scale 3d convolutional neural network. In: Proceedings of the 28th ACM international conference on multimedia, pp 3054–3062. https://doi.org/10.1145/3394171.3413861
Lin B, Zhang S, Yu X (2021) Gait recognition via effective global-local feature representation and local temporal aggregation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14648–14656. https://doi.org/10.1109/iccv48922.2021.01438
Lishani A O, Boubchir L, Khalifa E, Bouridane A (2019) Human gait recognition using gei-based local multi-scale feature descriptors. Multimed Tools Applic 78(5):5715–5730. https://doi.org/10.1007/s11042-018-5752-8
Liu J, Zha Z-J, Wu W et al (2021) Spatial-temporal correlation and topology learning for person re-identification in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4370–4379. https://doi.org/10.1109/cvpr46437.2021.00435
Liu X, You Z, He Y et al (2022) Symmetry-driven hyper feature gcn for skeleton-based gait recognition. Pattern Recogn 125:108520. https://doi.org/10.1016/j.patcog.2022.108520
Makihara Y, Mannami H, Tsuji A et al (2012) The ou-isir gait database comprising the treadmill dataset. IPSJ Trans Comput Vis Applic 4:53–62. https://doi.org/10.2197/ipsjtcva.4.53
Maltoni D, Maio D, Jain A et al (2005) Handbook of fingerprint recognition. Ch Synthetic Fingerprint Generation 33(5–6):1314
Mao M, Song Y (2020) Gait recognition based on 3d skeleton data and graph convolutional network. In: 2020 IEEE International joint conference on biometrics (IJCB). https://doi.org/10.1109/ijcb48548.2020.9304916
Marín-Jimínez M J, Castro F M, Delgado-Escaño R et al (2021) Ugaitnet: multimodal gait recognition with missing input modalities. IEEE Trans Inf Forensics Secur 16:5452–5462. https://doi.org/10.1109/TIFS.2021.3132579
Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp 8026–8037
Ross A A, Govindarajan R (2005) Feature level fusion of hand and face biometrics. In: Biometric technology for human identification II, vol 5779. International Society for Optics and Photonics, pp 196–204. https://doi.org/10.1117/12.606093
Shekhar S, Patel V M, Nasrabadi N M et al (2014) Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell 36(1):113–126. https://doi.org/10.1109/tpami.2013.109
Singh T, Vishwakarma D K (2021) A deep multimodal network based on bottleneck layer features fusion for action recognition. Multimed Tools Applic 80 (24):33505–33525. https://doi.org/10.1007/s11042-021-11415-9
Singh T, Vishwakarma D K (2021) A deeply coupled convnet for human activity recognition using dynamic and rgb images. Neural Comput Appl 33(1):469–485. https://doi.org/10.1007/s00521-020-05018-y
Sun J, Wang Y, Li J et al (2018) View-invariant gait recognition based on kinect skeleton feature. Multimed Tools Applic 77(19):24909–24935. https://doi.org/10.1007/s11042-018-5722-1
Sun Y, Chen Y, Wang X et al (2014) Deep learning face representation by joint identification-verification. Advances in Neural Information Processing Systems, 27
Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00584
Takemura N, Makihara Y, Muramatsu D et al (2018) Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Trans Comput Vis Applic 10(1):1–14. https://doi.org/10.1186/s41074-018-0039-6
Teepe T, Khan A, Gilg J et al (2021) Gaitgraph: graph convolutional network for skeleton-based gait recognition. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2314–2318. https://doi.org/10.1109/icip42928.2021.9506717
Tong S, Fu Y, Yue X et al (2018) Multi-view gait recognition based on a spatial-temporal deep neural network. IEEE Access 6:57583–57596. https://doi.org/10.1109/access.2018.2874073
Wang Y, Zhang X, Shen Y et al (2021) Event-stream representation for human gaits identification using deep neural networks. IEEE Trans Pattern Anal Mach Intell, https://doi.org/10.1109/tpami.2021.3054886
Wildes, R.P. (1997) Iris recognition: an emerging biometric technology. Proc IEEE 85(9):1348–1363. https://doi.org/10.1109/5.628669
Wu Z, Huang Y, Wang L et al (2017) A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans Pattern Anal Mach Intell 39(02):209–226. https://doi.org/10.1109/tpami.2016.2545669
Xin Y, Kong L, Liu Z et al (2018) Multimodal feature-level fusion for biometrics identification system on iomt platform. IEEE Access, 1–1. https://doi.org/10.1109/access.2018.2815540
Xu C, Makihara Y, Li X et al (2019) Speed-invariant gait recognition using single-support gait energy image. Multimed Tools Applic 78 (18):26509–26536. https://doi.org/10.1007/s11042-019-7712-3
Xu H, Li Y, Sun X et al (2020) Joint metric learning and hierarchical network for gait recognition. IEEE Access 8:228088–228098. https://doi.org/10.1109/ACCESS.2020.3044580
Xu K, Jiang X, Sun T (2021) Gait identification based on human skeleton with pairwise graph convolutional network. In: 2021 IEEE International conference on multimedia and expo (ICME). IEEE, pp 1–6. https://doi.org/10.1109/icme51207.2021.9428123
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Yao L, Kusakunniran W, Wu Q et al (2021) Collaborative feature learning for gait recognition under cloth changes. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/tcsvt.2021.3112564
Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International conference on pattern recognition (ICPR 2006), 20-24 August 2006, Hong Kong, China. https://doi.org/10.1109/icpr.2006.67
Yu S, Chen H, Wang Q et al (2017) Invariant feature extraction for gait recognition using only one uniform model. Neurocomputing 239:81–93. https://doi.org/10.1016/j.neucom.2017.02.006
Zhang Z, Tran L, Yin X et al (2019) Gait recognition via disentangled representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4710–4719. https://doi.org/10.1109/cvpr.2019.00484
Zhang Z, Tran L, Liu F et al (2020) On learning disentangled representations for gait recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2020.2998790
Zheng L, Yang Y, Hauptmann A G (2016) Person re-identification: past, present and future. arXiv:http://arxiv.org/abs/1610.02984
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, Y., Ma, K., Zhang, Y. et al. Learning rich features for gait recognition by integrating skeletons and silhouettes. Multimed Tools Appl 83, 7273–7294 (2024). https://doi.org/10.1007/s11042-023-15483-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15483-x