Skip to main content

A Novel CNN-LSTM Hybrid Architecture for the Recognition of Human Activities

Part of the Proceedings of the International Neural Networks Society book series (INNS,volume 3)

Abstract

The problem of human activity recognition (HAR) has been increasingly attracting the efforts of the research community, having several applications. In this paper we propose a multi-modal approach addressing the task of video-based HAR. Our approach uses three modalities, i.e., raw RGB video data, depth sequences and 3D skeletal motion data. The latter are transformed into a 2D image representation into the spectral domain. In order to extract spatio-temporal features from the available data, we propose a novel hybrid deep neural network architecture that combines a Convolutional Neural Network (CNN) and a Long-Short Term Memory (LSTM) network. We focus on the tasks of recognition of activities of daily living (ADLs) and medical conditions and we evaluate our approach using two challenging datasets.

Keywords

  • Human activity recognition
  • Convolutional neural networks
  • Long short term memory networks
  • Multimodal analysis

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-80568-5_10
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-80568-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016)

    Google Scholar 

  2. Chollet, F.: Keras (2015). https://github.com/fchollet/keras

  3. Haque, M.A., et al.: Deep multimodal pain recognition: a database and comparison of spatio-temporal visual modalities. In: Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition (2018)

    Google Scholar 

  4. Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: Fusenet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Proceedings of ACCV (2016)

    Google Scholar 

  5. Huynh-The, T., Hua, C.H., Ngo, T.T., Kim, D.S.: Image representation of pose-transition feature for 3D skeleton-based action recognition. Inf. Sci. 513, 112–126 (2020)

    CrossRef  Google Scholar 

  6. Hou, Y., Li, Z., Wang, P., Li, W.: Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans. CSVT 28(3), 807–811 (2016)

    Google Scholar 

  7. Imran, J., Raman, B.: Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Ambient Intell. Hum. Comput. 11(1), 189–208 (2020)

    CrossRef  Google Scholar 

  8. Jiang, W., Yin, Z.: Human activity recognition using wearable sensors by deep convolutional neural networks. In: Proceedings of ACM International Conference on Multimedia (2015)

    Google Scholar 

  9. Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: Skeletonnet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017)

    CrossRef  Google Scholar 

  10. Li, C., Hou, Y., Wang, P., Li, W.: Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process. Lett. 24(5), 624–628 (2017)

    CrossRef  Google Scholar 

  11. Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN. In: Proceedings of IEEE ICME Workshops (2017)

    Google Scholar 

  12. Li, X., et al.: Concurrent activity recognition with multimodal CNN-LSTM structure. arXiv preprint arXiv:1702.01638 (2017)

  13. Liu, C., Hu, Y., Li, Y., Song, S., Liu, J.: PKU-MMD: A large scale benchmark for continuous multi-modal human action understanding. arXiv preprint arXiv:1703.07475 (2017)

  14. Liu, J., Akhtar, N., Mian, A.: Skepxels: spatio-temporal image representation of human skeleton joints for action recognition. In: CVPR Workshops (2019)

    Google Scholar 

  15. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)

    CrossRef  Google Scholar 

  16. Liu, J., Akhtar, N., Mian, A.: Viewpoint invariant RGB-D human action recognition. In: Proceedings of International Conference on DICTA (2017)

    Google Scholar 

  17. Papadakis, A., Mathe, E., Vernikos, I., Maniatis, A., Spyrou, E., Mylonas, P.: Recognizing human actions using 3D skeletal information and CNNs. In: Proceedings of EANN (2019)

    Google Scholar 

  18. Papadakis, A., Mathe, E., Spyrou, E., Mylonas, P.: A geometric approach for cross-view human action recognition using deep learning. In: Proceedings of ISPA (2019)

    Google Scholar 

  19. Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.: Spatio-temporal image representation of 3D skeletal movements for view-invariant action recognition with deep convolutional neural networks. Sensors 19(8), 1932 (2019)

    CrossRef  Google Scholar 

  20. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of CVPR (2016)

    Google Scholar 

  21. Sun, L., Jia, K., Chen, K., Yeung, D.Y., Shi, B.E., Savarese, S.: Lattice long short-term memory for human action recognition. In: Proceedings of ICCV (2017)

    Google Scholar 

  22. Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: Proceedings of ACM-MM (Oct 2016)

    Google Scholar 

  23. Yang, Z., Li, Y., Yang, J., Luo, J.: Action recognition with spatio-temporal visual attention on skeleton image sequences. IEEE Trans. CSVT 29(8), 2405–2415 (2018)

    Google Scholar 

  24. Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017)

    CrossRef  Google Scholar 

Download references

Acknowledgments

This project has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No. 273 (Funding Decision:\(\Gamma \Gamma \)ET122785/I2/19-07-2018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evaggelos Spyrou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Stylianou-Nikolaidou, S., Vernikos, I., Mathe, E., Spyrou, E., Mylonas, P. (2021). A Novel CNN-LSTM Hybrid Architecture for the Recognition of Human Activities. In: Iliadis, L., Macintyre, J., Jayne, C., Pimenidis, E. (eds) Proceedings of the 22nd Engineering Applications of Neural Networks Conference. EANN 2021. Proceedings of the International Neural Networks Society, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-80568-5_10

Download citation