Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

Castro, Francisco M.; Marín-Jiménez, Manuel J.; Guil, Nicolás; Pérez de la Blanca, Nicolás

doi:10.1007/s00521-020-04811-z

Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

Original Article
Published: 07 March 2020

Volume 32, pages 14173–14193, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Francisco M. Castro¹,
Manuel J. Marín-Jiménez²,
Nicolás Guil³ &
…
Nicolás Pérez de la Blanca⁴

1366 Accesses
47 Citations
3 Altmetric
Explore all metrics

Abstract

People identification in video based on the way they walk (i.e., gait) is a relevant task in computer vision using a noninvasive approach. Standard and current approaches typically derive gait signatures from sequences of binary energy maps of subjects extracted from images, but this process introduces a large amount of non-stationary noise, thus conditioning their efficacy. In contrast, in this paper we focus on the raw pixels, or simple functions derived from them, letting advanced learning techniques to extract relevant features. Therefore, we present a comparative study of different convolutional neural network (CNN) architectures by using three different modalities (i.e., gray pixels, optical flow channels and depth maps) on two widely adopted and challenging datasets: TUM-GAID and CASIA-B. In addition, we perform a comparative study between different early and late fusion methods used to combine the information obtained from each kind of modalities. Our experimental results suggest that (1) the raw pixel values represent a competitive input modality, compared to the traditional state-of-the-art silhouette-based features (e.g., GEI), since equivalent or better results are obtained; (2) the fusion of the raw pixel information with information from optical flow and depth maps allows to obtain state-of-the-art results on the gait recognition task with an image resolution several times smaller than the previously reported results; and (3) the selection and the design of the CNN architecture are critical points that can make a difference between state-of-the-art results or poor ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Article 12 June 2020

Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: a review

Article 25 August 2021

References

Ahmadi N, Akbarizadeh G (2018) Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3754-0
Article Google Scholar
Zeng F, Hu S (2019) Xiao K (2018) Research on partial fingerprint recognition algorithm based on deep learning. Neural Comput Appl 31:4789–4798. https://doi.org/10.1007/s00521-018-3609-8
Article Google Scholar
Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 104:90–126
Article Google Scholar
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488
Article Google Scholar
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01
Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Systems Man Cybern C Appl Rev 34(3):334–352
Article Google Scholar
Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322
Article Google Scholar
Wu Z, Huang Y, Wang L, Wang X, Tan T (2017) A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE Trans Pattern Anal Mach Intell 39(2):209–226
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Marín-Jiménez M, de la Blanca NP, Mendoza M, Lucena M, Fuertes J (2009) Learning action descriptors for recognition. In: WIAMIS 2009, vol 0, London, UK. IEEE Computer Society, pp 5–8
Marín-Jiménez MJ, De La Blanca NP, Mendoza MA (2010) RBM-based silhouette encoding for human action modelling. In: Proceedings of the international conference on pattern recognition. IEEE, pp 979–982
Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV), pp 233–248
de Jesús RJ (2017a) Stable Kalman filter and neural network for the chaotic systems identification. J Frankl Inst 354(16):7444–7462
Article MathSciNet Google Scholar
de Jesús RJ (2017b) Usnfis: uniform stable neuro fuzzy inference system. Neurocomputing 262:57–66
Article Google Scholar
de Jesús RJ (2009) Sofmls: online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309
Article Google Scholar
Liu B, Ding Z, Lv C (2019) Distributed training for multi-layer neural networks by consensus. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2921926
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, Springer, pp 818–833
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3361–3368
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1725–1732
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2625–2634
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314
Perronnin F, Larlus D (2015) Fisher vectors meet neural networks: a hybrid classification architecture. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3743–3752
Hossain E, Chetty G (2013) Multimodal feature learning for gait biometric based human identity recognition. In: Neural information processing, pp 721–728
Wu Z, Huang Y, Wang L (2015) Learning representative deep features for image set analysis. IEEE Trans Multimed 17(11):1960–1968
Article Google Scholar
Gálai B, Benedek C (2015) Feature selection for lidar-based gait recognition. In: 2015 International workshop on computational intelligence for multimedia understanding (IWCIM), pp 1–5
Alotaibi M, Mahmood A (2015) Improved gait recognition based on specialized deep convolutional neural networks. In: IEEE applied imagery pattern recognition workshop (AIPR), pp 1–7
Takemura N, Makihara Y, Muramatsu D, Echigo T, Yagi Y (2017) On input/output architectures for convolutional neural network-based cross-view gait recognition. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2017.2760835
Article Google Scholar
He Y, Zhang J, Shan H, Wang L (2019) Multi-task gans for view-specific feature learning in gait recognition. IEEE Trans Inf Forensics Secur 14(1):102–113
Article Google Scholar
Castro FM, Marín-Jiménez MJ, Guil N, Pérez de la Blanca N (2017a) Automatic learning of gait signatures for people identification. In: Advances in Computational intelligence: 14th international work-conference on artificial neural networks (IWANN), pp 257–270
Castro FM, Marín-Jiménez MJ, Guil N, López-Tapia S, de la Blanca NP (2017b) Evaluation of CNN architectures for gait recognition based on optical flow maps. In: BIOSIG, pp 251–258
Marín-Jiménez MJ, Castro FM, Guil N, de la Torre F, Medina-Carnicer R (2017) Deep multi-task learning for gait-based biometrics. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 106–110
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the international conference on computer vision (ICCV), pp 4489–4497
Wolf T, Babaee M, Rigoll G (2016) Multi-view gait recognition using 3D convolutional neural networks. In: Proceedings of the IEEE international conference on image processing, pp 4165–4169
Mansimov E, Srivastava N, Salakhutdinov R (2015) Initialization strategies of spatio-temporal convolutional neural networks. CoRR arXiv:1503.07274
Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia 2015 Technical Briefs, p 18
Neverova N, Wolf C, Lacey G, Fridman L, Chandra D, Barbello B, Taylor G (2016) Learning human identity from motion patterns. IEEE Access 4:1810–1820
Article Google Scholar
Delgado-Escaño R, Castro FM, Cózar JR, Marín-Jiménez MJ, Guil N (2019) An end-to-end multi-task and fusion CNN for inertial-based gait recognition. IEEE Access 7:1897–1908
Article Google Scholar
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379
Article Google Scholar
Wu S (2009) Applying statistical principles to data fusion in information retrieval. Expert Syst Appl 36(2):2997–3006
Article Google Scholar
Chai Y, Ren J, Zhao H, Li Y, Ren J, Murray P (2015) Hierarchical and multi-featured fusion for effective gait recognition under variable scenarios. Pattern Anal Applic 19:905–917. https://doi.org/10.1007/s10044-015-0471-5
Article MathSciNet Google Scholar
Hofmann M, Geiger J, Bachmann S, Schuller B, Rigoll G (2014) The TUM gait from audio, image and depth (gaid) database: multimodal recognition of subjects and traits. J Vis Commun Image Represent 25(1):195–206
Article Google Scholar
Castro FM, Marín-Jiménez, Guil N (2015) Empirical study of audio-visual features fusion for gait recognition. In: Proceedings of the international conference on computer analysis of images and patterns, pp 727–739
Castro FM, Marín-Jiménez MJ, Guil N (2016) Multimodal features fusion for gait, gender and shoes recognition. Mach Vis Appl 27(8):1213–1228
Article Google Scholar
Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: Proceedings of the IEEE/RSJ conference on intelligent robots and systems. IEEE, pp 681–687
Wang A, Lu J, Cai J, Cham TJ, Wang G (2015) Large-margin multi-modal deep learning for RGB-D object recognition. IEEE Trans Multimed 17(11):1887–1898
Article Google Scholar
Sivapalan S, Chen D, Denman S, Sridharan S, Fookes C (2011) Gait energy volumes and frontal gait recognition using depth images. In: 2011 international joint conference on biometrics (IJCB). IEEE, pp 1–6
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 41–48
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for Matlab. In: Proceedings of the 23rd ACM international conference on Multimedia, ACM, pp 689–692
Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E (2014) cuDNN: efficient primitives for deep learning. CoRR arxiv:1410.0759
Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proc Int Conf Pattern Recognit 4:441–444
Google Scholar
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Proceedings of scandinavian conference on image analysis, vol 2749, pp 363–370
Bradski G (2000) OpenCV library. Dr Dobb’s J Softw Tools 25:120–125
Google Scholar
KaewTraKulPong P, Bowden R (2002) An improved adaptive background mixture model for real-time tracking with shadow detection. In: Jones GA, Paragios N, Regazzoni CS (eds) Video-based surveillance systems. Springer, Berlin, pp 135–144
Chapter Google Scholar
Barnich O, Droogenbroeck MV (2009) Frontal-view gait recognition by intra- and inter-frame rectangle size distribution. Pattern Recognit Lett 30(10):893–901
Article Google Scholar
Castro FM, Marín-Jiménez M, Guil Mata N, Muñoz Salinas R (2017) Fisher motion descriptor for multiview gait recognition. Int J Pattern Recognit Artif Intell 31(1):1756002
Article Google Scholar
Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recognit 47(11):3568–3584
Article Google Scholar
Whytock T, Belyaev A, Robertson N (2014) Dynamic distance-based shape features for gait recognition. J Math Imaging Vis 50(3):314–326
Article Google Scholar
Guan Y, Li CT (2013) A robust speed-invariant gait recognition system for walker and runner identification. In: IEEE international conference on biometrics (ICB), pp 1–8
Chen X, Weng J, Lu W, Xu J (2018) Multi-gait recognition based on attribute discovery. IEEE Trans Pattern Anal Mach Intell 40(7):1697–1710
Article Google Scholar
Hu M, Wang Y, Zhang Z, Zhang D, Little J (2013) Incremental learning for video-based gait recognition with LBP flow. IEEE Trans Cybern 43(1):77–89
Article Google Scholar
Wang C, Zhang J, Wang L, Pu J, Yuan X (2012) Human identification using temporal information preserving gait template. IEEE Trans Pattern Anal Mach Intell 34(11):2164–2176
Article Google Scholar
Li W, Kuo CCJ, Peng J (2018) Gait recognition via gei subspace projections and collaborative representation classification. Neurocomputing 275:1932–1945
Article Google Scholar

Download references

Acknowledgements

This work has been funded by project TIC-1692 (Junta de Andalucía). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. Portions of the research in this paper use the CASIA Gait Database collected by Institute of Automation, Chinese Academy of Sciences.

Author information

Authors and Affiliations

Department of Computer Architecture, University of Malaga, Bulevar Louis Pasteur, 35, Office 2.3.8a, 29071, Malaga, Spain
Francisco M. Castro
Department of Computing and Numerical Analysis, University of Cordoba, Cordoba, Spain
Manuel J. Marín-Jiménez
Department of Computer Architecture, University of Malaga, Malaga, Spain
Nicolás Guil
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Nicolás Pérez de la Blanca

Authors

Francisco M. Castro
View author publications
You can also search for this author in PubMed Google Scholar
Manuel J. Marín-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás Guil
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás Pérez de la Blanca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco M. Castro.

Ethics declarations

Conflict of interest

This work has been founded by a research project of Junta de Andalucía, Spain. Moreover, Francisco M. Castro and Nicolás Guil are working for the University of Málaga, Manuel J. Marín-Jiménez is working for the University of Córdoba, and Nicolás Pérez de la Blanca is working for the University of Granada.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castro, F.M., Marín-Jiménez, M.J., Guil, N. et al. Multimodal feature fusion for CNN-based gait recognition: an empirical comparison. Neural Comput & Applic 32, 14173–14193 (2020). https://doi.org/10.1007/s00521-020-04811-z

Download citation

Received: 06 March 2019
Accepted: 22 February 2020
Published: 07 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00521-020-04811-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of object detection based on deep learning

Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation