Abstract
Human action prediction in a live-streaming videos is a popular task in computer vision and pattern recognition. This attempts to identify activities in an image or video performed by a human. Artificial intelligence(AI)-based technologies are now required for the security and human behaviour analysis. Intricate motion patterns are involved in these actions. For the visual representation of video frames, conventional action identification approaches mostly rely on pre-trained weights of various AI architectures. This paper proposes a deep neural network called Attention-based long short-term memory (LSTM) network for skeletal based activity prediction from a video. The proposed model has been evaluated on the ‘BerkeleyMHAD’ dataset having 11 action classes. Our experimental results are compared against the performance of the LSTM and Attention-based LSTM network for 6 action classes such as Jumping, Clapping, Stand-up, Sit-down, Waving one hand (Right) and Waving two hands. Also, the proposed method has been tested in a real-time environment unaffected by the pose, camera facing, and apparel. The proposed system has attained an accuracy of 95.94% on ‘BerkeleyMHAD’ dataset. Hence, the proposed method is useful in an intelligent vision computing system for automatically identifying human activity in unpremeditated behaviour.
Similar content being viewed by others
Data availability
This paper uses Berkeley Multimodal Human Action Database (MHAD) for training the model. This dataset is released under BS2-License. The link of the dataset is provided as follows: https://tele-immersion.citris-uc.org/berkeley_mhad
References:
Ghazal, S., et al.: Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Proc. 13(13), 2572–2578 (2019). https://doi.org/10.1049/iet-ipr.2019.0030
Hbali, Y., et al.: Skeleton-based human activity recognition for elderly monitoring systems. IET Comput. Vision 12(1), 16–26 (2018). https://doi.org/10.1049/iet-cvi.2017.0062
Muhammad, K., et al.: Human action recognition using attention based LSTM network with dilated CNN features. Future Gener. Comput. Syst. 125, 820–830 (2021). https://doi.org/10.1016/j.future.2021.06.045]
Le, T.-L., Nguyen, M.-Q., Nguyen, T.-T.-M.: Human posture recognition using human skeleton provided by Kinect. In: 2013 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 340–345 (2013). https://doi.org/10.1109/ComManTel.2013.6482417
Ding, Z. et al.: Investigation of Different Skeleton Features for CNN-based 3D Action Recognition. (2017)
Jalal, A., Kamal, S., Kim, D.: A depth video-based human detection and activity recognition using multi-features and embedded hidden markov models for health care monitoring systems. Int. J. Interact. Multimed. Artif. Intell. 4, 54 (2017). https://doi.org/10.9781/ijimai.2017.447
Ben Tamou, A., Ballihi, L., Aboutajdine, D.: Automatic learning of articulated skeletons based on mean of 3D joints for efficient action recognition. Int. J. Pattern Recognit Artif Intell. 31(04), 1750008 (2017). https://doi.org/10.1142/S0218001417500082
Zerrouki, N., et al.: Vision-based human action classification using adaptive boosting algorithm. IEEE Sens. J. 18(12), 5115–5121 (2018). https://doi.org/10.1109/JSEN.2018.2830743
Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features. Act. Recognit. Chall. 1(2), 2 (2021)
Manzi, A., Dario, P., Cavallo, F.: A human activity recognition system based on dynamic clustering of skeleton data. Sensors (2017). https://doi.org/10.3390/s17051100
Le, T., Ly, N.: Human action recognition on simple and complex background in video. In: 2012 International Conference on Control, Automation and Information Sciences, ICCAIS 2012, pp. 114–119 (2012). https://doi.org/10.1109/ICCAIS.2012.6466569
Kushwaha, A.K.S., Srivastava, R.: Article: A Framework for Human Activity Recognition using Pose Feature for Video Surveillance System. In: IJCA Proceedings on National Conference on Next Generation Technologies for e-Business, e-Education and e-Society NGTBES 2016(1), 1–4 (2016)
Wang, W.J., et al.: Human posture recognition based on images captured by the Kinect sensor. Int. J. Adv. Robot. Syst. (2016). https://doi.org/10.5772/62163
Bengalur, M.D.: Human activity recognition using body pose features and support vector machine. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI). pp. 1970–1975 (2013). https://doi.org/10.1109/ICACCI.2013.6637484
Liu, J., et al.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018). https://doi.org/10.1109/TPAMI.2017.2771306
Ma, S., Sigal, L., Sclaroff, S.: Space-time tree ensemble for action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5024–5032 (2015). doi:https://doi.org/10.1109/CVPR.2015.7299137
Chuankun, L. et al.: Skeleton-based action recognition using LSTM and CNN. pp 585–590. (2017) https://doi.org/10.1109/ICMEW.2017.8026287
Ke, Q., et al.: SkeletonNet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017). https://doi.org/10.1109/LSP.2017.2690339
Zhu, W. et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM Networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3697–3703, AAAI Press (2016)
Lee, I. et al.: Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1012–1020 (2017). https://doi.org/10.1109/ICCV.2017.115
Zhang, P., et al.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019). https://doi.org/10.1109/TPAMI.2019.2896631
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Cao, Z. et al.: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. (2018)
Weng, J., Weng, C., Yuan, J.: Spatio-temporal naive-bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-January, pp. 445–454, Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/CVPR.2017.55
Ofli, F. et al.: Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 8–13 (2012). https://doi.org/10.1109/CVPRW.2012.6239231
Ofli, F. et al.: Berkeley MHAD: A comprehensive Multimodal Human Action Database. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp 53–60 (2013) [doi:https://doi.org/10.1109/WACV.2013.6474999].
Acknowledgements
The research was carried out as a part of the project “Development of a Computer Vision system for an AI assistant”, funded by the Yukti Sanchita Programme (2021) of Indian Space and Research Organization(ISRO), Trivandrum.
Funding
The research was carried out as a part of the project "Development of a Computer Vision system for an AI assistant", funded by the Yukti Sanchita Programme, 2021, (PR No. YS/PD-IP/2021/326) of Indian Space and Research Organization (ISRO), Trivandrum.
Author information
Authors and Affiliations
Contributions
B—Formulated the problem statement, carried out the literature review of the existing works, Designed the proposed algorithm, and wrote the manuscript. RS—Implemented the proposed algorithm, evaluated the model's performance, and wrote the portions of the manuscript. MS—Formulated the problem statement, reviewed the work, guided for preparation of the manuscript, and corrected the manuscript. MS—Reviewed the progress of the work and guided in correcting the manuscript. SKC—Reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bharathi, A., Sanku, R., Sridevi, M. et al. Real-time human action prediction using pose estimation with attention-based LSTM network. SIViP 18, 3255–3264 (2024). https://doi.org/10.1007/s11760-023-02987-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02987-0