Skip to main content

Advertisement

Log in

Real-time human action prediction using pose estimation with attention-based LSTM network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Human action prediction in a live-streaming videos is a popular task in computer vision and pattern recognition. This attempts to identify activities in an image or video performed by a human. Artificial intelligence(AI)-based technologies are now required for the security and human behaviour analysis. Intricate motion patterns are involved in these actions. For the visual representation of video frames, conventional action identification approaches mostly rely on pre-trained weights of various AI architectures. This paper proposes a deep neural network called Attention-based long short-term memory (LSTM) network for skeletal based activity prediction from a video. The proposed model has been evaluated on the ‘BerkeleyMHAD’ dataset having 11 action classes. Our experimental results are compared against the performance of the LSTM and Attention-based LSTM network for 6 action classes such as Jumping, Clapping, Stand-up, Sit-down, Waving one hand (Right) and Waving two hands. Also, the proposed method has been tested in a real-time environment unaffected by the pose, camera facing, and apparel. The proposed system has attained an accuracy of 95.94% on ‘BerkeleyMHAD’ dataset. Hence, the proposed method is useful in an intelligent vision computing system for automatically identifying human activity in unpremeditated behaviour.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

This paper uses Berkeley Multimodal Human Action Database (MHAD) for training the model. This dataset is released under BS2-License. The link of the dataset is provided as follows: https://tele-immersion.citris-uc.org/berkeley_mhad

References:

  1. Ghazal, S., et al.: Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Proc. 13(13), 2572–2578 (2019). https://doi.org/10.1049/iet-ipr.2019.0030

    Article  Google Scholar 

  2. Hbali, Y., et al.: Skeleton-based human activity recognition for elderly monitoring systems. IET Comput. Vision 12(1), 16–26 (2018). https://doi.org/10.1049/iet-cvi.2017.0062

    Article  Google Scholar 

  3. Muhammad, K., et al.: Human action recognition using attention based LSTM network with dilated CNN features. Future Gener. Comput. Syst. 125, 820–830 (2021). https://doi.org/10.1016/j.future.2021.06.045]

    Article  Google Scholar 

  4. Le, T.-L., Nguyen, M.-Q., Nguyen, T.-T.-M.: Human posture recognition using human skeleton provided by Kinect. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 340–345 (2013). https://doi.org/10.1109/ComManTel.2013.6482417

  5. Ding, Z. et al.: Investigation of Different Skeleton Features for CNN-based 3D Action Recognition. (2017)

  6. Jalal, A., Kamal, S., Kim, D.: A depth video-based human detection and activity recognition using multi-features and embedded hidden markov models for health care monitoring systems. Int. J. Interact. Multimed. Artif. Intell. 4, 54 (2017). https://doi.org/10.9781/ijimai.2017.447

    Article  Google Scholar 

  7. Ben Tamou, A., Ballihi, L., Aboutajdine, D.: Automatic learning of articulated skeletons based on mean of 3D joints for efficient action recognition. Int. J. Pattern Recognit Artif Intell. 31(04), 1750008 (2017). https://doi.org/10.1142/S0218001417500082

    Article  Google Scholar 

  8. Zerrouki, N., et al.: Vision-based human action classification using adaptive boosting algorithm. IEEE Sens. J. 18(12), 5115–5121 (2018). https://doi.org/10.1109/JSEN.2018.2830743

    Article  Google Scholar 

  9. Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. Act. Recognit. Chall. 1(2), 2 (2021)

    Google Scholar 

  10. Manzi, A., Dario, P., Cavallo, F.: A human activity recognition system based on dynamic clustering of skeleton data. Sensors (2017). https://doi.org/10.3390/s17051100

    Article  Google Scholar 

  11. Le, T., Ly, N.: Human action recognition on simple and complex background in video. In: 2012 International Conference on Control, Automation and Information Sciences, ICCAIS 2012, pp. 114–119 (2012). https://doi.org/10.1109/ICCAIS.2012.6466569

  12. Kushwaha, A.K.S., Srivastava, R.: Article: A Framework for Human Activity Recognition using Pose Feature for Video Surveillance System. In: IJCA Proceedings on National Conference on Next Generation Technologies for e-Business, e-Education and e-Society NGTBES 2016(1), 1–4 (2016)

  13. Wang, W.J., et al.: Human posture recognition based on images captured by the Kinect sensor. Int. J. Adv. Robot. Syst. (2016). https://doi.org/10.5772/62163

    Article  Google Scholar 

  14. Bengalur, M.D.: Human activity recognition using body pose features and support vector machine. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI). pp. 1970–1975 (2013). https://doi.org/10.1109/ICACCI.2013.6637484

  15. Liu, J., et al.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018). https://doi.org/10.1109/TPAMI.2017.2771306

    Article  Google Scholar 

  16. Ma, S., Sigal, L., Sclaroff, S.: Space-time tree ensemble for action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5024–5032 (2015). doi:https://doi.org/10.1109/CVPR.2015.7299137

  17. Chuankun, L. et al.: Skeleton-based action recognition using LSTM and CNN. pp 585–590. (2017) https://doi.org/10.1109/ICMEW.2017.8026287

  18. Ke, Q., et al.: SkeletonNet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017). https://doi.org/10.1109/LSP.2017.2690339

    Article  Google Scholar 

  19. Zhu, W. et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM Networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3697–3703, AAAI Press (2016)

  20. Lee, I. et al.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1012–1020 (2017). https://doi.org/10.1109/ICCV.2017.115

  21. Zhang, P., et al.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019). https://doi.org/10.1109/TPAMI.2019.2896631

    Article  Google Scholar 

  22. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  23. Cao, Z. et al.: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. (2018)

  24. Weng, J., Weng, C., Yuan, J.: Spatio-temporal naive-bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-January, pp. 445–454, Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/CVPR.2017.55

  25. Ofli, F. et al.: Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 8–13 (2012). https://doi.org/10.1109/CVPRW.2012.6239231

  26. Ofli, F. et al.: Berkeley MHAD: A comprehensive Multimodal Human Action Database. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp 53–60 (2013) [doi:https://doi.org/10.1109/WACV.2013.6474999].

Download references

Acknowledgements

The research was carried out as a part of the project “Development of a Computer Vision system for an AI assistant”, funded by the Yukti Sanchita Programme (2021) of Indian Space and Research Organization(ISRO), Trivandrum.

Funding

The research was carried out as a part of the project "Development of a Computer Vision system for an AI assistant", funded by the Yukti Sanchita Programme, 2021, (PR No. YS/PD-IP/2021/326) of Indian Space and Research Organization (ISRO), Trivandrum.

Author information

Authors and Affiliations

Authors

Contributions

B—Formulated the problem statement, carried out the literature review of the existing works, Designed the proposed algorithm, and wrote the manuscript. RS—Implemented the proposed algorithm, evaluated the model's performance, and wrote the portions of the manuscript. MS—Formulated the problem statement, reviewed the work, guided for preparation of the manuscript, and corrected the manuscript. MS—Reviewed the progress of the work and guided in correcting the manuscript. SKC—Reviewed the manuscript.

Corresponding author

Correspondence to M. Sridevi.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bharathi, A., Sanku, R., Sridevi, M. et al. Real-time human action prediction using pose estimation with attention-based LSTM network. SIViP 18, 3255–3264 (2024). https://doi.org/10.1007/s11760-023-02987-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02987-0

Keywords

Navigation