Skip to main content

Advertisement

Log in

Learning Efficient Spatial-Temporal Gait Features with Deep Learning for Human Identification

  • Original Article
  • Published:
Neuroinformatics Aims and scope Submit manuscript

Abstract

The integration of the latest breakthroughs in bioinformatics technology from one side and artificial intelligence from another side, enables remarkable advances in the fields of intelligent security guard computational biology, healthcare, and so on. Among them, biometrics based automatic human identification is one of the most fundamental and significant research topic. Human gait, which is a biometric features with the unique capability, has gained significant attentions as the remarkable characteristics of remote accessed, robust and security in the biometrics based human identification. However, the existed methods cannot well handle the indistinctive inter-class differences and large intra-class variations of human gait in real-world situation. In this paper, we have developed an efficient spatial-temporal gait features with deep learning for human identification. First of all, we proposed a gait energy image (GEI) based Siamese neural network to automatically extract robust and discriminative spatial gait features for human identification. Furthermore, we exploit the deep 3-dimensional convolutional networks to learn the human gait convolutional 3D (C3D) as the temporal gait features. Finally, the GEI and C3D gait features are embedded into the null space by the Null Foley-Sammon Transform (NFST). In the new space, the spatial-temporal features are sufficiently combined with distance metric learning to drive the similarity metric to be small for pairs of gait from the same person, and large for pairs from different persons. Consequently, the experiments on the world’s largest gait database show our framework impressively outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ariyanto, G., & Nixon, M.S. (2011). Model based 3d gait biometrics. In Proceedings of international joint conference on biometrics, pp. 1–7. IEEE.

  • Bobick, A.F., & Johnson, A.Y. (2001). Gait recognition using static, activity-specific parameters. In Proceedings of IEEE conference on computer vision and pattern recognition, vol. 1, pp. I–I. IEEE.

  • Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In Proceedings of international conference on computer vision, vol. 1, pp. 105–112. IEEE.

  • Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R. (1993). Signature verification using a siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence, 7(04), 669–688.

    Article  Google Scholar 

  • Cao, C., Zhang, Y., Zhang, C., Lu, H. (2017). Body joint guided 3d deep convolutional descriptors for action recognition. CoRR arXiv:1704.07160.

  • Castro, F.M., Marín-Jimėnez, M.J., Guil, N., de la Blanca, N.P. (2016). Automatic learning of gait signatures for people identification. CoRR arXiv:1603.01006.

  • Chen, Z., Ngo, C., Zhang, W., Cao, J., Jiang, Y. (2014). Name-face association in web videos: A large-scale dataset, baselines, and open issues. J. Comput. Sci. Technol, 29(5), 785–798.

    Article  Google Scholar 

  • Chen, Z., Zhang, W., Deng, B., Xie, H., Gu, X. (2017). Name-face association with web facial image supervision. Multimedia Systems (4), 1–20.

  • Chopra, S., Hadsell, R., LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In Proceedings of IEEE conference on computer vision and pattern recognition, vol. 1, pp. 539–546. IEEE.

  • Feng, Y., Li, Y., Luo, J. (2016). Learning effective gait features using lstm. In 23rd international conference on pattern recognition, pp. 325–330. IEEE.

  • Gan, C., Wang, N., Yang, Y., Yeung, D., Hauptmann, A.G. (2015). Devnet: A deep event network for multimedia event detection and evidence recounting. In IEEE conference on computer vision and pattern recognition, pp. 2568–2577.

  • Gao, J., Yang, Z., Sun, C., Chen, K., Nevatia, R. (2017). TURN TAP: temporal unit regression network for temporal action proposals. CoRR arXiv:1703.06189.

  • Guo, Y.F., Wu, L., Lu, H., Feng, Z., Xue, X. (2006). Null foley–sammon transform. Pattern recognition, 39(11), 2248–2251.

    Article  Google Scholar 

  • Han, J., & Bhanu, B. (2006). Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2), 316–322.

    Article  PubMed  Google Scholar 

  • He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.

  • Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  • Hou, R., Chen, C., Shah, M. (2017). Tube convolutional neural network (T-CNN) for action detection in videos. CoRR arXiv:1703.10664.

  • Hu, M., Wang, Y., Zhang, Z., Zhang, D. (2011). Gait-based gender classification using mixed conditional random field. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 41(5), 1429–1439.

    Article  PubMed  Google Scholar 

  • Hu, M., Wang, Y., Zhang, Z., Zhang, D., Little, J.J. (2013). Incremental learning for video-based gait recognition with LBP flow. IEEE Transactions Cybernetics, 43(1), 77–89.

    Article  Google Scholar 

  • Iwama, H., Okumura, M., Makihara, Y., Yagi, Y. (2012). The ou-isir gait database comprising the large population dataset and performance evaluation of gait recognition. IEEE Transactions on Information Forensics and Security, 7(5), 1511–1521.

    Article  Google Scholar 

  • Ji, S., Xu, W., Yang, M. (2013). Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.

    Article  PubMed  Google Scholar 

  • Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of IEEE conference on computer vision and pattern recognition, pp. 1725–1732. IEEE.

  • Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105.

  • Kusakunniran, W. (2014). Attribute-based learning for gait recognition using spatio-temporal interest points. Image Vision Comput, 32(12), 1117–1126.

    Article  Google Scholar 

  • Lam, T.H.W., Cheung, K.H., Liu, J.N.K. (2011). Gait flow image: A silhouette-based gait representation for human identification. Pattern Recognition, 44(4), 973–987.

    Article  Google Scholar 

  • Liu, W., Mei, T., Zhang, Y. (2014). Instant mobile video search with layered audio-video indexing and progressive transmission. IEEE Transactions on Multimedia, 16(8), 2242–2255.

    Article  Google Scholar 

  • Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J. (2015). Multi-task deep visual-semantic embedding for video thumbnail selection. In IEEE conference on computer vision and pattern recognition, pp. 3707–3715.

  • Liu, W., Zhang, Y., Tang, S., Tang, J., Hong, R., Li, J. (2013). Accurate estimation of human body orientation from RGB-D sensors. IEEE Transactions on Cybernetics, 43(5), 1442–1452.

    Article  PubMed  Google Scholar 

  • Lombardi, S., Nishino, K., Makihara, Y., Yagi, Y. (2013). Two-point gait: decoupling gait from body shape. In IEEE international conference on computer vision, pp. 1041–1048.

  • Ma, H., & Liu, W. (2017). Progressive search paradigm for internet of things. IEEE Multimedia. https://doi.org/10.1109/MMUL.2017.265091429.

  • Ma, H., Zeng, C., Ling, C.X. (2012). A reliable people counting system via multiple cameras. ACM Transaction on Intelligent Systems and Technology, 3(2), 31.

    Google Scholar 

  • Makihara, Y., Rossa, B.S., Yagi, Y. (2012). Gait recognition using images of oriented smooth pseudo motion. In Proceedings of the IEEE international conference on systems, Man, and Cybernetics, SMC 2012, Seoul, Korea (South), October 14-17, 2012, pp. 1309–1314.

  • Makihara, Y., Sagawa, R., Mukaigawa, Y., Echigo, T., Yagi, Y. (2006). Gait recognition using a view transformation model in the frequency domain. In Proceedings of european conference on computer vision, pp. 151–163.

  • Mannini, A., Trojaniello, D., Cereatti, A., Sabatini, A.M. (2016). A machine learning framework for gait classification using inertial sensors: Application to elderly, post-stroke and huntington’s disease patients. Sensors, 16(1), 134.

    Article  Google Scholar 

  • Martín-Félez, R., & Xiang, T. (2012). Gait recognition by ranking. In Proceedings of european conference on computer vision, pp. 328–341. Springer.

  • Muja, M., & Lowe, D.G. (2012). Fast matching of binary features. In Proceedings of computer and robot vision, pp. 404–410.

  • Muramatsu, D., Shiraishi, A., Makihara, Y., Uddin, M., Yagi, Y. (2015). Gait-based person recognition using arbitrary view transformation model. IEEE Transactions on Image Processing, 24(1), 140–154.

    Article  PubMed  Google Scholar 

  • Nie, B.X., Xiong, C., Zhu, S. (2015). Joint action recognition and pose estimation from video. In Proceedings of IEEE conference on computer vision and pattern recognition, pp. 1293–1301.

  • Ren, P., Tang, S., Fang, F., Luo, L., Xu, L., Bringas-Vega, M.L., Yao, D., Kendrick, K.M., Valdes-Sosa, P.A. (2017). Gait rhythm fluctuation analysis for neurodegenerative diseases by empirical mode decomposition. IEEE Transactions Biomed. Engineering, 64(1), 52–60.

    Article  Google Scholar 

  • Samȧ, A., Pėrez-Lȯpez, C., Martín, D.R., Catalȧ, A., Arȯstegui, J.M., Cabestany, J., de Mingo, E., Rodríguez-Molinero, A. (2017). Estimating bradykinesia severity in parkinson’s disease by analysing gait through a waist-worn sensor. Comp. in Bio. and Med., 84, 114–123.

    Article  Google Scholar 

  • Sarkar, S., Phillips, P.J., Liu, Z., Vega, I.R., Grother, P., Bowyer, K.W. (2005). The humanid gait challenge problem: Data sets, performance, and analysis. IEEE Transactions Pattern Anal. Mach. Intell, 27(2), 162–177.

    Article  Google Scholar 

  • Shiraga, K., Makihara, Y., Muramatsu, D., Echigo, T., Yagi, Y. (2016). Geinet: View-invariant gait recognition using a convolutional neural network. In Proceedings of international conference on biometrics, pp. 1–8.

  • Sigal, L., Isard, M., Haussecker, H.W., Black, M.J. (2012). Loose-limbed people: Estimating 3d human pose and motion using non-parametric belief propagation. International Journal of Computer Vision, 98(1), 15–48.

    Article  Google Scholar 

  • Sivapalan, S., Chen, D., Denman, S., Sridharan, S., Fookes, C. (2013). Histogram of weighted local directions for gait recognition. In Proceedings of computer vision and pattern recognition workshop, pp. 125–130. IEEE.

  • Sutskever, I., Vinyals, O., Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112.

  • Tao, D., Li, X., Wu, X., Maybank, S.J. (2007). General tensor discriminant analysis and gabor features for gait recognition. IEEE Transactions Pattern Anal. Mach. Intell, 29(10), 1700–1715.

    Article  Google Scholar 

  • Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of international conference on computer vision, pp. 4489–4497.

  • Urtasun, R., & Fua, P. (2004). 3d tracking for gait characterization and recognition. In Proceedings of 6th IEEE international conference on automatic face and gesture recognition, pp. 17–22.

  • Varol, G., Laptev, I., Schmid, C. (2016). Long-term temporal convolutions for action recognition. CoRR arXiv:1604.04494.

  • Wang, B., Tang, S., Zhao, R., Liu, W., Cen, Y. (2015). Pedestrian detection based on region proposal fusion. In Proceedings of international workshop on multimedia signal processing, pp. 1–6. IEEE.

  • Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W. (2014). Robust estimation of 3d human poses from a single image. In Proceedings of IEEE conference on computer vision and pattern recognition, pp. 2369–2376.

  • Wang, C., Zhang, J., Pu, J., Yuan, X., Wang, L. (2010). Chrono-gait image: A novel temporal template for gait recognition. In Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part I, pp. 257–270.

  • Wang, L., Ning, H., Tan, T., Hu, W. (2004). Fusion of static and dynamic body biometrics for gait recognition. IEEE Transactions Circuits Syst. Video Techn, 14(2), 149–158.

    Article  Google Scholar 

  • Wang, L., Tan, T., Ning, H., Hu, W. (2003). Silhouette analysis-based gait recognition for human identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1505–1518.

    Article  Google Scholar 

  • Wu, Z., Huang, Y., Wang, L. (2015). Learning representative deep features for image set analysis. IEEE Transactions Multimedia, 17(11), 1960–1968.

    Article  Google Scholar 

  • Wu, Z., Huang, Y., Wang, L., Wang, X., Tan, T. (2017). A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(2), 209–226.

    Article  PubMed  Google Scholar 

  • Xia, Y., Gao, Q., Ye, Q. (2015). Classification of gait rhythm signals between patients with neuro-degenerative diseases and normal subjects: Experiments with statistical features and different classification models. Biomed. Signal Proceedings and Control, 18, 254–262.

    Article  Google Scholar 

  • Xu, H., Das, A., Saenko, K. (2017). R-C3D: region convolutional 3d network for temporal activity detection. CoRR arXiv:1703.07814.

  • Yam, C., Nixon, M.S., Carter, J.N. (2004). Automated person recognition by walking and running via model-based approaches. Pattern Recognition, 37(5), 1057–1072.

    Article  Google Scholar 

  • Yan, C.C., Xie, H., Liu, S., Yin, J., Zhang, Y., Dai, Q. (2017a). Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intelligent Transportation Systems.

  • Yan, C.C., Xie, H., Yang, D., Yin, J., Zhang, Y., Dai, Q. (2017b). Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intelligent Transportation Systems.

  • Yan, C.C., Zhang, Y., Xu, J., Dai, F., Li, L., Dai, Q., Wu, F. (2014). A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process. Lett, 21 (5), 573–576.

    Article  Google Scholar 

  • Yan, C.C., Zhang, Y., Xu, J., Dai, F., Zhang, J., Dai, Q., Wu, F. (2014). Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Transactions Circuits Syst. Video Techn, 24 (12), 2077–2089.

    Article  Google Scholar 

  • Yu, S., Tan, D., Tan, T. (2006). A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In Proceedings of international conference on pattern recognition, vol. 4, pp. 441–444. IEEE.

  • Yuan, X., Lai, W., Mei, T., Hua, X., Wu, X., Li, S. (2006). Automatic video genre categorization using hierarchical svm. In Proceedings of international conference on image processing, pp. 2905–2908. IEEE.

  • Zha, Z., Mei, T., Wang, Z., Hua, X. (2007). Building a comprehensive ontology to refine video concept detection. In Proceedings of the international workshop on multimedia information retrieval, pp. 227–236. ACM.

  • Zhang, C., Liu, W., Ma, H., Fu, H. (2016). Siamese neural network based gait recognition for human identification. In IEEE international conference on acoustics, speech and signal processing, pp. 2832–2836.

  • Zhang, D., & Shah, M. (2015). Human pose estimation in videos. In Proceedings of IEEE international conference on computer vision, pp. 2012–2020.

  • Zhang, L., Xiang, T., Gong, S. (2016). Learning a discriminative null space for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1239–1248.

  • Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T. (2017). Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. CoRR arXiv:1704.00616.

Download references

Acknowledgements

This work is partially supported by the Funds for International Cooperation and Exchange of the National Natural Science Foundation of China (No. 61720106007), the NSFC-Guangdong Joint Fund (No. U1501254), the National Natural Science Foundation of China (No. 61602049), and the Cosponsored Project of Beijing Committee of Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wu Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Zhang, C., Ma, H. et al. Learning Efficient Spatial-Temporal Gait Features with Deep Learning for Human Identification. Neuroinform 16, 457–471 (2018). https://doi.org/10.1007/s12021-018-9362-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12021-018-9362-4

Keywords

Navigation