Real-time human action prediction using pose estimation with attention-based LSTM network

Bharathi, A.; Sanku, Rigved; Sridevi, M.; Manusubramanian, S.; Chandar, S. Kumar

doi:10.1007/s11760-023-02987-0

Real-time human action prediction using pose estimation with attention-based LSTM network

Original Paper
Published: 25 January 2024

Volume 18, pages 3255–3264, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

A. Bharathi¹,
Rigved Sanku¹,
M. Sridevi¹,
S. Manusubramanian² &
…
S. Kumar Chandar³

303 Accesses
Explore all metrics

Abstract

Human action prediction in a live-streaming videos is a popular task in computer vision and pattern recognition. This attempts to identify activities in an image or video performed by a human. Artificial intelligence(AI)-based technologies are now required for the security and human behaviour analysis. Intricate motion patterns are involved in these actions. For the visual representation of video frames, conventional action identification approaches mostly rely on pre-trained weights of various AI architectures. This paper proposes a deep neural network called Attention-based long short-term memory (LSTM) network for skeletal based activity prediction from a video. The proposed model has been evaluated on the ‘BerkeleyMHAD’ dataset having 11 action classes. Our experimental results are compared against the performance of the LSTM and Attention-based LSTM network for 6 action classes such as Jumping, Clapping, Stand-up, Sit-down, Waving one hand (Right) and Waving two hands. Also, the proposed method has been tested in a real-time environment unaffected by the pose, camera facing, and apparel. The proposed system has attained an accuracy of 95.94% on ‘BerkeleyMHAD’ dataset. Hence, the proposed method is useful in an intelligent vision computing system for automatically identifying human activity in unpremeditated behaviour.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Behaviors Classification Using Deep Learning Technique

Spatio-temporal Weight of Active Region for Human Activity Recognition

Human Action Recognition Research Based on Fusion TS-CNN and LSTM Networks

Article 20 September 2022

Data availability

This paper uses Berkeley Multimodal Human Action Database (MHAD) for training the model. This dataset is released under BS2-License. The link of the dataset is provided as follows: https://tele-immersion.citris-uc.org/berkeley_mhad

References:

Ghazal, S., et al.: Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Proc. 13(13), 2572–2578 (2019). https://doi.org/10.1049/iet-ipr.2019.0030
Article Google Scholar
Hbali, Y., et al.: Skeleton-based human activity recognition for elderly monitoring systems. IET Comput. Vision 12(1), 16–26 (2018). https://doi.org/10.1049/iet-cvi.2017.0062
Article Google Scholar
Muhammad, K., et al.: Human action recognition using attention based LSTM network with dilated CNN features. Future Gener. Comput. Syst. 125, 820–830 (2021). https://doi.org/10.1016/j.future.2021.06.045]
Article Google Scholar
Le, T.-L., Nguyen, M.-Q., Nguyen, T.-T.-M.: Human posture recognition using human skeleton provided by Kinect. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 340–345 (2013). https://doi.org/10.1109/ComManTel.2013.6482417
Ding, Z. et al.: Investigation of Different Skeleton Features for CNN-based 3D Action Recognition. (2017)
Jalal, A., Kamal, S., Kim, D.: A depth video-based human detection and activity recognition using multi-features and embedded hidden markov models for health care monitoring systems. Int. J. Interact. Multimed. Artif. Intell. 4, 54 (2017). https://doi.org/10.9781/ijimai.2017.447
Article Google Scholar
Ben Tamou, A., Ballihi, L., Aboutajdine, D.: Automatic learning of articulated skeletons based on mean of 3D joints for efficient action recognition. Int. J. Pattern Recognit Artif Intell. 31(04), 1750008 (2017). https://doi.org/10.1142/S0218001417500082
Article Google Scholar
Zerrouki, N., et al.: Vision-based human action classification using adaptive boosting algorithm. IEEE Sens. J. 18(12), 5115–5121 (2018). https://doi.org/10.1109/JSEN.2018.2830743
Article Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. Act. Recognit. Chall. 1(2), 2 (2021)
Google Scholar
Manzi, A., Dario, P., Cavallo, F.: A human activity recognition system based on dynamic clustering of skeleton data. Sensors (2017). https://doi.org/10.3390/s17051100
Article Google Scholar
Le, T., Ly, N.: Human action recognition on simple and complex background in video. In: 2012 International Conference on Control, Automation and Information Sciences, ICCAIS 2012, pp. 114–119 (2012). https://doi.org/10.1109/ICCAIS.2012.6466569
Kushwaha, A.K.S., Srivastava, R.: Article: A Framework for Human Activity Recognition using Pose Feature for Video Surveillance System. In: IJCA Proceedings on National Conference on Next Generation Technologies for e-Business, e-Education and e-Society NGTBES 2016(1), 1–4 (2016)
Wang, W.J., et al.: Human posture recognition based on images captured by the Kinect sensor. Int. J. Adv. Robot. Syst. (2016). https://doi.org/10.5772/62163
Article Google Scholar
Bengalur, M.D.: Human activity recognition using body pose features and support vector machine. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI). pp. 1970–1975 (2013). https://doi.org/10.1109/ICACCI.2013.6637484
Liu, J., et al.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018). https://doi.org/10.1109/TPAMI.2017.2771306
Article Google Scholar
Ma, S., Sigal, L., Sclaroff, S.: Space-time tree ensemble for action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5024–5032 (2015). doi:https://doi.org/10.1109/CVPR.2015.7299137
Chuankun, L. et al.: Skeleton-based action recognition using LSTM and CNN. pp 585–590. (2017) https://doi.org/10.1109/ICMEW.2017.8026287
Ke, Q., et al.: SkeletonNet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017). https://doi.org/10.1109/LSP.2017.2690339
Article Google Scholar
Zhu, W. et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM Networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3697–3703, AAAI Press (2016)
Lee, I. et al.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1012–1020 (2017). https://doi.org/10.1109/ICCV.2017.115
Zhang, P., et al.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019). https://doi.org/10.1109/TPAMI.2019.2896631
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Cao, Z. et al.: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. (2018)
Weng, J., Weng, C., Yuan, J.: Spatio-temporal naive-bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-January, pp. 445–454, Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/CVPR.2017.55
Ofli, F. et al.: Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 8–13 (2012). https://doi.org/10.1109/CVPRW.2012.6239231
Ofli, F. et al.: Berkeley MHAD: A comprehensive Multimodal Human Action Database. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp 53–60 (2013) [doi:https://doi.org/10.1109/WACV.2013.6474999].

Download references

Acknowledgements

The research was carried out as a part of the project “Development of a Computer Vision system for an AI assistant”, funded by the Yukti Sanchita Programme (2021) of Indian Space and Research Organization(ISRO), Trivandrum.

Funding

The research was carried out as a part of the project "Development of a Computer Vision system for an AI assistant", funded by the Yukti Sanchita Programme, 2021, (PR No. YS/PD-IP/2021/326) of Indian Space and Research Organization (ISRO), Trivandrum.

Author information

Authors and Affiliations

National Institute of Technology, Tiruchirappalli, India
A. Bharathi, Rigved Sanku & M. Sridevi
Liquid Propulsion Systems Centre, ISRO, Thiruvananthapuram, India
S. Manusubramanian
Christ University, Bangalore, India
S. Kumar Chandar

Authors

A. Bharathi
View author publications
You can also search for this author in PubMed Google Scholar
Rigved Sanku
View author publications
You can also search for this author in PubMed Google Scholar
M. Sridevi
View author publications
You can also search for this author in PubMed Google Scholar
S. Manusubramanian
View author publications
You can also search for this author in PubMed Google Scholar
S. Kumar Chandar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B—Formulated the problem statement, carried out the literature review of the existing works, Designed the proposed algorithm, and wrote the manuscript. RS—Implemented the proposed algorithm, evaluated the model's performance, and wrote the portions of the manuscript. MS—Formulated the problem statement, reviewed the work, guided for preparation of the manuscript, and corrected the manuscript. MS—Reviewed the progress of the work and guided in correcting the manuscript. SKC—Reviewed the manuscript.

Corresponding author

Correspondence to M. Sridevi.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bharathi, A., Sanku, R., Sridevi, M. et al. Real-time human action prediction using pose estimation with attention-based LSTM network. SIViP 18, 3255–3264 (2024). https://doi.org/10.1007/s11760-023-02987-0

Download citation

Received: 12 August 2022
Revised: 20 November 2022
Accepted: 25 December 2023
Published: 25 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11760-023-02987-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time human action prediction using pose estimation with attention-based LSTM network

Abstract

Access this article

Similar content being viewed by others

Human Behaviors Classification Using Deep Learning Technique

Spatio-temporal Weight of Active Region for Human Activity Recognition

Human Action Recognition Research Based on Fusion TS-CNN and LSTM Networks

Data availability

References:

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time human action prediction using pose estimation with attention-based LSTM network

Abstract

Access this article

Similar content being viewed by others

Human Behaviors Classification Using Deep Learning Technique

Spatio-temporal Weight of Active Region for Human Activity Recognition

Human Action Recognition Research Based on Fusion TS-CNN and LSTM Networks

Data availability

References:

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation