Abstract
Yoga has become an essential part of modern life, and hence, there has been a tremendous demand for self-training yoga platforms for trainer-less yoga practice. Robust and efficient recognition of yoga poses in video stream is the first requirement of such systems. However, the existing techniques for yoga pose recognition are compute-intensive and may fail in complex real-world conditions. These issues pose serious limitations on their practical applicability. To this end, this paper examines state-of-the-art deep learning techniques to implement a robust and compute-efficient system for yoga pose recognition in real-time on a resource-constrained embedded platform. The first technique uses a hybrid CNN & LSTM model, while the other three (3DCNN Model1, 3DCNN Model2, and 3DCNN Model3) employ the Sports1M pre-trained 3DCNN model named C3D. We assessed the performance of the designed architectures on a publicly available yoga pose database by applying four well-known metrics, namely recognition accuracy, precision, recall, and F1-score. On three database splits, the designed hybrid CNN & LSTM, 3DCNN Model1, 3DCNN Model2, and 3DCNN Model3 achieved mean recognition accuracy of 98.80%, 99.07% 98.19%, and 98.43%, respectively. Also, on one of the splits, the best-performing model achieved the highest recognition accuracy of 99.65% and, thus, surpassed the baseline accuracy of 99.38%. Also, the optimal model runs at a frame rate of 31 FPS on an Nvidia GPU-enabled desktop, much better than the previous best of 3 FPS. Finally, to evaluate the model’s efficiency on embedded systems, we optimized it using TensorRT SDK and deployed it on an Nvidia Xavier embedded platform. The optimized model runs at 8 FPS on the resource-constrained embedded platform, demonstrating its suitability for real-world applications. A working demo of the developed system is available at https://youtu.be/at1GJ8Nxx38, and the source codes are available at https://github.com/sumeetssaurav/Yoga-Pose-Classification.
Similar content being viewed by others
Data Availability
All dataset used in the study are freely available.
Code Availability
Custom code.
References
Alp Güler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306. https://doi.org/10.1109/CVPR.2018.00762
Ashraf FB, Islam MU, Kabir MR et al (2023) Yonet: A neural network for yoga pose classification. SN Comput Sci 4(2):198
Bai L, Efstratiou C, Ang CS (2016) Wesport: utilising wrist-band sensing to detect player activities in basketball games. In: 2016 IEEE international conference on pervasive computing and communication workshops (PerCom Workshops), IEEE, pp 1–6. https://doi.org/10.1109/PERCOMW.2016.7457167
Cao Z, Simon T, Wei SE, et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299. https://doi.org/10.1109/CVPR.2017.143
Chen C, Wang G, Peng C et al (2019) Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process 29:1090–1100
Chen C, Wang G, Peng C et al (2021) Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans Image Process 30:3995–4007
Chen HT, He YZ, Chou CL, et al (2013) Computer-assisted self-training system for sports exercise using kinects. In: 2013 IEEE international conference on multimedia and expo workshops (ICMEW), IEEE, pp 1–4. https://doi.org/10.1109/ICMEW.2013.6618307
Chen HT, He YZ, Hsu CC, et al (2014) Yoga posture recognition for self-training. In: international conference on multimedia modeling, Springer, pp 496–505. https://doi.org/10.1007/978-3-319-04114-8_42
Chen HT, He YZ, Hsu CC (2018) Computer-assisted yoga training system. Multimed Tools Appl 77(18):23969–23991. https://doi.org/10.1007/s11042-018-5721-2
Connaghan D, Kelly P, O’Connor NE, et al (2011) Multi-sensor classification of tennis strokes. In: SENSORS, 2011 IEEE, IEEE, pp 1437–1440. https://doi.org/10.1109/ICSENS.2011.6127084
Dantone M, Gall J, Leistner C, et al (2013) Human pose estimation using body parts dependent joint regressors. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3041–3048. https://doi.org/10.1109/CVPR.2013.391
De Michelis E (2005) A history of modern yoga: patanjali and western esotericism. A &C Black
Desai M, Mewada H (2023) A novel approach for yoga pose estimation based on in-depth analysis of human body joint detection accuracy. PeerJ Comput Sci 9:e1152
Ditty M, Karandikar A, Reed D (2018) Nvidia’s xavier soc. In: hot chips: a symposium on high performance chips
Fang HS, Xie S, Tai YW, et al (2017) Rmpe: regional multi-person pose estimation. In: proceedings of the IEEE international conference on computer vision, pp 2334–2343. https://doi.org/10.1109/ICCV.2017.256
Gaiswinkler L, Unterrainer H (2016) The relationship between yoga involvement, mindfulness and psychological well-being. Complement Ther Med 26:123–127
Gan D, Wang Y, Zhang N et al (2017) (2017) Enhancing short-term probabilistic residential load forecasting with quantile long-short-term memory. J Eng 14:2622–2627
Gao Z, Zhang H, Liu AA et al (2016) Human action recognition on depth dataset. Neural Comput Appl 27(7):2047–2054. https://doi.org/10.1007/s00521-015-2002-0
Garg S, Saxena A, Gupta R (2022) Yoga pose classification: a cnn and mediapipe inspired deep learning approach for real-world application. Journal of ambient intelligence and humanized computing pp 1–12
Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 6645–6649
Greff K, Srivastava RK, Koutník J et al (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Guddeti RR, Dang G, Williams MA et al (2019) Role of yoga in cardiac disease and rehabilitation. J Cardiopulm Rehabil Prev 39(3):146–152. https://doi.org/10.1097/hcr.0000000000000372
Gupta A, Gupta HP (2021) Yogahelp: Leveraging motion sensors for learning correct execution of yoga with feedback. IEEE Trans Artif Intell 2(4):362–371
Hsieh CC, Wu BS, Lee CC (2011) A distance computer vision assisted yoga learning system. J Comput 6(11):2382–2388. https://doi.org/10.4304/jcp.6.11.2382-2388
Huang Z, Liu Y, Fang Y, et al (2018) Video-based fall detection for seniors with human pose estimation. In: 2018 4th international conference on universal village (UV), IEEE, pp 1–4
Jain S, Rustagi A, Saurav S et al (2021) Three-dimensional cnn-inspired deep learning architecture for yoga pose recognition in the real-world environment. Neural Comput Appl 33:6427–6441
Joo H, Liu H, Tan L, et al (2015) Panoptic studio: A massively multiview system for social motion capture. In: proceedings of the IEEE international conference on computer vision, pp 3334–3342. https://doi.org/10.1109/ICCV.2015.381
Kelly P, Healy A, Moran K, et al (2010) A virtual coaching environment for improving golf swing technique. In: proceedings of the 2010 ACM workshop on surreal media and virtual cloning, pp 51–56. https://doi.org/10.1145/1878083.1878098
Li J, Zhang D, Shi L et al (2023) An improved high-resolution network-based method for yoga-pose estimation. Appl Sci 13(15):8912
Li Y, Li S, Chen C et al (2020) A plug-and-play scheme to adapt image saliency deep model for video data. IEEE Trans Circuits Syst Video Technol 31(6):2315–2327
Lim SA, Cheong KJ (2015) Regular yoga practice improves antioxidant status, immune function, and stress hormone releases in young healthy people: a randomized, double-blind, controlled pilot study. J Altern Complement Med 21(9):530–538. https://doi.org/10.1089/acm.2014.0044
Liu Y, Stoll C, Gall J, et al (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: CVPR 2011, IEEE, pp 1249–1256. https://doi.org/10.1109/CVPR.2011.5995424
Lu N, Wu Y, Feng L et al (2018) Deep learning for fall detection: Three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323. https://doi.org/10.1109/JBHI.2018.2808281
Luo Z, Yang W, Ding ZQ, et al (2011) "left arm up!" interactive yoga training in virtual environment. In: 2011 IEEE virtual reality conference, IEEE, pp 261–262. https://doi.org/10.1109/VR.2011.5759498
Maanijou R, Mirroshandel SA (2019) Introducing an expert system for prediction of soccer player ranking using ensemble learning. Neural Comput Appl 31(12):9157–9174. https://doi.org/10.1007/s00521-019-04036-9
Martinez J, Hossain R, Romero J, et al (2017) A simple yet effective baseline for 3d human pose estimation. In: proceedings of the IEEE international conference on computer vision, pp 2640–2649. https://doi.org/10.1109/ICCV.2017.288
Mohanty A, Ahmed A, Goswami T, et al (2017) Robust pose recognition using deep learning. In: proceedings of international conference on computer vision and image processing, Springer, pp 93–105. https://doi.org/10.1007/978-981-10-2107-7_9
Nordsborg NB, Espinosa HG, Thiel DV (2014) Estimating energy expenditure during front crawl swimming using accelerometers. Procedia Eng 72:132–137. https://doi.org/10.1016/j.proeng.2014.06.024
Okonta NR (2012) Does yoga therapy reduce blood pressure in patients with hypertension?: an integrative review. Holist Nurs Pract 26(3):137–141
Palanimeera J, Ponmozhi K (2023) Yoga posture recognition by learning spatial-temporal feature with deep learning techniques. International journal of image and graphics p 2450055
Pascoe MC, Thompson DR, Ski CF (2017) Yoga, mindfulness-based stress reduction and stress-related physiological measures: A meta-analysis. Psychoneuroendocrinology 86:152–168
Patil S, Pawar A, Peshave A, et al (2011) Yoga tutor visualization and analysis using surf algorithm. In: 2011 IEEE Control and System Graduate Research Colloquium, IEEE, pp 43–46, 10.1109/ICSGRC.2011.5991827
Prathikanti S, Rivera R, Cochran A et al (2017) Treating major depression with yoga: A prospective, randomized, controlled pilot trial. PLoS ONE. https://doi.org/10.1371/journal.pone.0173869
Przednowek K, Wiktorowicz K, Krzeszowski T et al (2019) A web-oriented expert system for planning hurdles race training programmes. Neural Comput Appl 31(11):7227–7243. https://doi.org/10.1007/s00521-018-3559-1
Qiang B, Zhang S, Zhan Y et al (2019) Improved convolutional pose machines for human pose estimation using image sensor data. Sensors 19(3):718. https://doi.org/10.3390/s19030718
Rector K, Bennett CL, Kientz JA (2013) Eyes-free yoga: an exergame using depth cameras for blind & low vision exercise. In: proceedings of the 15th international ACM SIGACCESS conference on computers and accessibility, pp 1–8. https://doi.org/10.1145/2513383.2513392
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Sarubin N, Nothdurfter C, Schüle C et al (2014) The influence of hatha yoga as an add-on treatment in major depression on hypothalamic-pituitary-adrenal-axis activity: A randomized trial. J Psychiatr Res 53:76–83
Sathyanarayanan G, Vengadavaradan A, Bharadwaj B (2019) Role of yoga and mindfulness in severe mental illnesses: A narrative review. International journal of yoga 12(1):3. https://doi.org/10.4103/2Fijoy.IJOY_65_17
Saurav S, Saini R, Singh S (2021) A dual-stream fused neural network for fall detection in multi-camera and \(360^{\circ }\) videos. Neural computing and applications pp 1–28
Schure MB, Christopher J, Christopher S (2008) Mind-body medicine and the art of self-care: teaching mindfulness to counseling students through yoga, meditation, and qigong. J Couns & Dev 86(1):47–56. https://doi.org/10.1002/j.1556-6678.2008.tb00625.x
Sethi JK, Nagendra H, Ganpat TS (2013) Yoga improves attention and self-esteem in underprivileged girl student. J Educ health Promot 2. https://doi.org/10.4103/2277-9531.119043
Shan CZ, Ming ESL, Rahman HA, et al (2015) Investigation of upper limb movement during badminton smash. In: 2015 10th asian control conference (ASCC), IEEE, pp 1–6. https://doi.org/10.1109/ASCC.2015.7244605
Sharma A, Agrawal Y, Shah Y, et al (2022) Iyogacare: real-time yoga recognition and self-correction for smart healthcare. IEEE Consumer electronics magazine
Shotton J, Fitzgibbon A, Cook M, et al (2011) Real-time human pose recognition in parts from single depth images. In: CVPR 2011, Ieee, pp 1297–1304. https://doi.org/10.1109/CVPR.2011.5995316
Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 doi.org/10.48550
Swain D, Satapathy S, Acharya B et al (2022) Deep learning models for yoga pose monitoring. Algoritm 15(11):403
Tian Y, Zitnick CL, Narasimhan SG (2012) Exploring the spatial hierarchy of mixture models for human pose estimation. In: european conference on computer vision, Springer, pp 256–269 https://doi.org/10.1007/978-3-642-33715-4_19
Tompson JJ, Jain A, LeCun Y, et al (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: advances in neural information processing systems, pp 1799–1807. https://doi.org/10.5555/2968826.2969027
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214
Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: proceedings of the IEEE international conference on computer vision, pp 4489–4497. https://doi.org/10.1109/ICCV.2015.510
Trejo EW, Yuan P (2018) Recognition of yoga poses through an interactive system with kinect device. In: 2018 2nd international conference on robotics and automation sciences (ICRAS), IEEE, pp 1–5. https://doi.org/10.1109/ICRAS.2018.8443267
Ullah A, Ahmad J, Muhammad K et al (2017) Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE Access 6:1155–1166. https://doi.org/10.1109/ACCESS.2017.2778011
Upadhyay A, Basha NK, Ananthakrishnan B (2023) Deep learning-based yoga posture recognition using the y_pn-mssd model for yoga practitioners. In: healthcare, MDPI, p 609
Vallabhaneni N, Prabhavathy P (2023) Segmentation quality assessment network-based object detection and optimized cnn with transfer learning for yoga pose classification for health care. Soft Computing pp 1–23
Verma M, Kumawat S, Nakashima Y, et al (2020) Yoga-82: a new dataset for fine-grained classification of human poses. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 1038–1039
Waldron M, Twist C, Highton J et al (2011) Movement and physiological match demands of elite rugby league using portable global positioning systems. J Sports Sci 29(11):1223–1230. https://doi.org/10.1080/02640414.2011.587445
Wang C, Wang Y, Lin Z, et al (2014) Robust estimation of 3d human poses from a single image. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2361–2368. https://doi.org/10.1109/CVPR.2014.303
Wang J, Yu LC, Lai KR, et al (2016) Dimensional sentiment analysis using a regional cnn-lstm model. In: proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: short papers), pp 225–230. https://doi.org/10.18653/v1/P162037
Wang L, Xu Y, Cheng J et al (2018) Human action recognition by learning spatio-temporal features with deep neural networks. IEEE access 6:17913–17922. https://doi.org/10.1109/ACCESS.2018.2817253
Wei G, Zhou H, Zhang L et al (2023) Spatial-temporal self-attention enhanced graph convolutional networks for fitness yoga action recognition. Sensors 23(10):4741
Wu W, Yin W, Guo F (2010) Learning and self-instruction expert system for yoga. In: 2010 2nd international workshop on intelligent systems and applications, IEEE, pp 1–4. https://doi.org/10.1109/IWISA.2010.5473592
Wu Y, Lin Q, Yang M, et al (2021) A computer vision-based yoga pose grading approach using contrastive skeleton feature representations. In: healthcare, MDPI, p 36
Wu Z, Zhang J, Chen K et al (2019) Yoga posture recognition and quantitative evaluation with wearable sensors based on two-stage classifier and prior bayesian network. Sensors 19(23):5129. https://doi.org/10.3390/s19235129
Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500. https://doi.org/10.1109/CVPR.2017.634
Yadav SK, Singh A, Gupta A et al (2019) Real-time yoga recognition using deep learning. Neural Comput Appl 31(12):9349–9361. https://doi.org/10.1007/s00521-019-04232-7
Yadav SK, Agarwal A, Kumar A et al (2022) Yognet: A two-stream network for realtime multiperson yoga action recognition and posture correction. Knowl-Based Syst 250:109097
Yahya U, Senanayake SA, Naim A (2018) A database-driven neural computing framework for classification of vertical jump patterns of healthy female netballers using 3d kinematics–emg features. Neural Computing and Applications, pp 1–20. https://doi.org/10.1007/s00521-018-3653-4
Zhang L, Zhu G, Shen P, et al (2017) Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: proceedings of the IEEE international conference on computer vision workshops, pp 3120–3128
Zhang L, Zhu G, Mei L, et al (2018) Attention in convolutional lstm for gesture recognition. In: proceedings of the 32nd international conference on neural information processing systems, pp 1957–1966
Acknowledgements
The authors would like to acknowledge the support of Director, CSIR-CEERI, Pilani for providing the necessary infrastructure. The authors would also like to thank Yadav et al. for making their Yoga pose dataset publicly available.
Funding
No external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no competing interests that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saurav, S., Gidde, P. & Singh, S. Exploration of deep learning architectures for real-time yoga pose recognition. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18694-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18694-y