Skip to main content
Log in

Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

People identification in video based on the way they walk (i.e., gait) is a relevant task in computer vision using a noninvasive approach. Standard and current approaches typically derive gait signatures from sequences of binary energy maps of subjects extracted from images, but this process introduces a large amount of non-stationary noise, thus conditioning their efficacy. In contrast, in this paper we focus on the raw pixels, or simple functions derived from them, letting advanced learning techniques to extract relevant features. Therefore, we present a comparative study of different convolutional neural network (CNN) architectures by using three different modalities (i.e., gray pixels, optical flow channels and depth maps) on two widely adopted and challenging datasets: TUM-GAID and CASIA-B. In addition, we perform a comparative study between different early and late fusion methods used to combine the information obtained from each kind of modalities. Our experimental results suggest that (1) the raw pixel values represent a competitive input modality, compared to the traditional state-of-the-art silhouette-based features (e.g., GEI), since equivalent or better results are obtained; (2) the fusion of the raw pixel information with information from optical flow and depth maps allows to obtain state-of-the-art results on the gait recognition task with an image resolution several times smaller than the previously reported results; and (3) the selection and the design of the CNN architecture are critical points that can make a difference between state-of-the-art results or poor ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ahmadi N, Akbarizadeh G (2018) Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3754-0

    Article  Google Scholar 

  2. Zeng F, Hu S (2019) Xiao K (2018) Research on partial fingerprint recognition algorithm based on deep learning. Neural Comput Appl 31:4789–4798. https://doi.org/10.1007/s00521-018-3609-8

    Article  Google Scholar 

  3. Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 104:90–126

    Article  Google Scholar 

  4. Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488

    Article  Google Scholar 

  5. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01

  6. Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Systems Man Cybern C Appl Rev 34(3):334–352

    Article  Google Scholar 

  7. Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322

    Article  Google Scholar 

  8. Wu Z, Huang Y, Wang L, Wang X, Tan T (2017) A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE Trans Pattern Anal Mach Intell 39(2):209–226

    Article  Google Scholar 

  9. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  10. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  11. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  12. Marín-Jiménez M, de la Blanca NP, Mendoza M, Lucena M, Fuertes J (2009) Learning action descriptors for recognition. In: WIAMIS 2009, vol 0, London, UK. IEEE Computer Society, pp 5–8

  13. Marín-Jiménez MJ, De La Blanca NP, Mendoza MA (2010) RBM-based silhouette encoding for human action modelling. In: Proceedings of the international conference on pattern recognition. IEEE, pp 979–982

  14. Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV), pp 233–248

  15. de Jesús RJ (2017a) Stable Kalman filter and neural network for the chaotic systems identification. J Frankl Inst 354(16):7444–7462

    Article  MathSciNet  Google Scholar 

  16. de Jesús RJ (2017b) Usnfis: uniform stable neuro fuzzy inference system. Neurocomputing 262:57–66

    Article  Google Scholar 

  17. de Jesús RJ (2009) Sofmls: online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309

    Article  Google Scholar 

  18. Liu B, Ding Z, Lv C (2019) Distributed training for multi-layer neural networks by consensus. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2921926

    Article  Google Scholar 

  19. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556

  20. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, Springer, pp 818–833

  21. Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3361–3368

  22. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1725–1732

  23. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

  24. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2625–2634

  25. Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314

  26. Perronnin F, Larlus D (2015) Fisher vectors meet neural networks: a hybrid classification architecture. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3743–3752

  27. Hossain E, Chetty G (2013) Multimodal feature learning for gait biometric based human identity recognition. In: Neural information processing, pp 721–728

  28. Wu Z, Huang Y, Wang L (2015) Learning representative deep features for image set analysis. IEEE Trans Multimed 17(11):1960–1968

    Article  Google Scholar 

  29. Gálai B, Benedek C (2015) Feature selection for lidar-based gait recognition. In: 2015 International workshop on computational intelligence for multimedia understanding (IWCIM), pp 1–5

  30. Alotaibi M, Mahmood A (2015) Improved gait recognition based on specialized deep convolutional neural networks. In: IEEE applied imagery pattern recognition workshop (AIPR), pp 1–7

  31. Takemura N, Makihara Y, Muramatsu D, Echigo T, Yagi Y (2017) On input/output architectures for convolutional neural network-based cross-view gait recognition. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2017.2760835

    Article  Google Scholar 

  32. He Y, Zhang J, Shan H, Wang L (2019) Multi-task gans for view-specific feature learning in gait recognition. IEEE Trans Inf Forensics Secur 14(1):102–113

    Article  Google Scholar 

  33. Castro FM, Marín-Jiménez MJ, Guil N, Pérez de la Blanca N (2017a) Automatic learning of gait signatures for people identification. In: Advances in Computational intelligence: 14th international work-conference on artificial neural networks (IWANN), pp 257–270

  34. Castro FM, Marín-Jiménez MJ, Guil N, López-Tapia S, de la Blanca NP (2017b) Evaluation of CNN architectures for gait recognition based on optical flow maps. In: BIOSIG, pp 251–258

  35. Marín-Jiménez MJ, Castro FM, Guil N, de la Torre F, Medina-Carnicer R (2017) Deep multi-task learning for gait-based biometrics. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 106–110

  36. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the international conference on computer vision (ICCV), pp 4489–4497

  37. Wolf T, Babaee M, Rigoll G (2016) Multi-view gait recognition using 3D convolutional neural networks. In: Proceedings of the IEEE international conference on image processing, pp 4165–4169

  38. Mansimov E, Srivastava N, Salakhutdinov R (2015) Initialization strategies of spatio-temporal convolutional neural networks. CoRR arXiv:1503.07274

  39. Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia 2015 Technical Briefs, p 18

  40. Neverova N, Wolf C, Lacey G, Fridman L, Chandra D, Barbello B, Taylor G (2016) Learning human identity from motion patterns. IEEE Access 4:1810–1820

    Article  Google Scholar 

  41. Delgado-Escaño R, Castro FM, Cózar JR, Marín-Jiménez MJ, Guil N (2019) An end-to-end multi-task and fusion CNN for inertial-based gait recognition. IEEE Access 7:1897–1908

    Article  Google Scholar 

  42. Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379

    Article  Google Scholar 

  43. Wu S (2009) Applying statistical principles to data fusion in information retrieval. Expert Syst Appl 36(2):2997–3006

    Article  Google Scholar 

  44. Chai Y, Ren J, Zhao H, Li Y, Ren J, Murray P (2015) Hierarchical and multi-featured fusion for effective gait recognition under variable scenarios. Pattern Anal Applic 19:905–917. https://doi.org/10.1007/s10044-015-0471-5

    Article  MathSciNet  Google Scholar 

  45. Hofmann M, Geiger J, Bachmann S, Schuller B, Rigoll G (2014) The TUM gait from audio, image and depth (gaid) database: multimodal recognition of subjects and traits. J Vis Commun Image Represent 25(1):195–206

    Article  Google Scholar 

  46. Castro FM, Marín-Jiménez, Guil N (2015) Empirical study of audio-visual features fusion for gait recognition. In: Proceedings of the international conference on computer analysis of images and patterns, pp 727–739

  47. Castro FM, Marín-Jiménez MJ, Guil N (2016) Multimodal features fusion for gait, gender and shoes recognition. Mach Vis Appl 27(8):1213–1228

    Article  Google Scholar 

  48. Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: Proceedings of the IEEE/RSJ conference on intelligent robots and systems. IEEE, pp 681–687

  49. Wang A, Lu J, Cai J, Cham TJ, Wang G (2015) Large-margin multi-modal deep learning for RGB-D object recognition. IEEE Trans Multimed 17(11):1887–1898

    Article  Google Scholar 

  50. Sivapalan S, Chen D, Denman S, Sridharan S, Fookes C (2011) Gait energy volumes and frontal gait recognition using depth images. In: 2011 international joint conference on biometrics (IJCB). IEEE, pp 1–6

  51. Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  52. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 41–48

  53. Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for Matlab. In: Proceedings of the 23rd ACM international conference on Multimedia, ACM, pp 689–692

  54. Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E (2014) cuDNN: efficient primitives for deep learning. CoRR arxiv:1410.0759

  55. Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proc Int Conf Pattern Recognit 4:441–444

    Google Scholar 

  56. Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Proceedings of scandinavian conference on image analysis, vol 2749, pp 363–370

  57. Bradski G (2000) OpenCV library. Dr Dobb’s J Softw Tools 25:120–125

    Google Scholar 

  58. KaewTraKulPong P, Bowden R (2002) An improved adaptive background mixture model for real-time tracking with shadow detection. In: Jones GA, Paragios N, Regazzoni CS (eds) Video-based surveillance systems. Springer, Berlin, pp 135–144

    Chapter  Google Scholar 

  59. Barnich O, Droogenbroeck MV (2009) Frontal-view gait recognition by intra- and inter-frame rectangle size distribution. Pattern Recognit Lett 30(10):893–901

    Article  Google Scholar 

  60. Castro FM, Marín-Jiménez M, Guil Mata N, Muñoz Salinas R (2017) Fisher motion descriptor for multiview gait recognition. Int J Pattern Recognit Artif Intell 31(1):1756002

    Article  Google Scholar 

  61. Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recognit 47(11):3568–3584

    Article  Google Scholar 

  62. Whytock T, Belyaev A, Robertson N (2014) Dynamic distance-based shape features for gait recognition. J Math Imaging Vis 50(3):314–326

    Article  Google Scholar 

  63. Guan Y, Li CT (2013) A robust speed-invariant gait recognition system for walker and runner identification. In: IEEE international conference on biometrics (ICB), pp 1–8

  64. Chen X, Weng J, Lu W, Xu J (2018) Multi-gait recognition based on attribute discovery. IEEE Trans Pattern Anal Mach Intell 40(7):1697–1710

    Article  Google Scholar 

  65. Hu M, Wang Y, Zhang Z, Zhang D, Little J (2013) Incremental learning for video-based gait recognition with LBP flow. IEEE Trans Cybern 43(1):77–89

    Article  Google Scholar 

  66. Wang C, Zhang J, Wang L, Pu J, Yuan X (2012) Human identification using temporal information preserving gait template. IEEE Trans Pattern Anal Mach Intell 34(11):2164–2176

    Article  Google Scholar 

  67. Li W, Kuo CCJ, Peng J (2018) Gait recognition via gei subspace projections and collaborative representation classification. Neurocomputing 275:1932–1945

    Article  Google Scholar 

Download references

Acknowledgements

This work has been funded by project TIC-1692 (Junta de Andalucía). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. Portions of the research in this paper use the CASIA Gait Database collected by Institute of Automation, Chinese Academy of Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco M. Castro.

Ethics declarations

Conflict of interest

This work has been founded by a research project of Junta de Andalucía, Spain. Moreover, Francisco M. Castro and Nicolás Guil are working for the University of Málaga, Manuel J. Marín-Jiménez is working for the University of Córdoba, and Nicolás Pérez de la Blanca is working for the University of Granada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castro, F.M., Marín-Jiménez, M.J., Guil, N. et al. Multimodal feature fusion for CNN-based gait recognition: an empirical comparison. Neural Comput & Applic 32, 14173–14193 (2020). https://doi.org/10.1007/s00521-020-04811-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04811-z

Keywords

Navigation