Skip to main content
Log in

Exploration of deep learning architectures for real-time yoga pose recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Yoga has become an essential part of modern life, and hence, there has been a tremendous demand for self-training yoga platforms for trainer-less yoga practice. Robust and efficient recognition of yoga poses in video stream is the first requirement of such systems. However, the existing techniques for yoga pose recognition are compute-intensive and may fail in complex real-world conditions. These issues pose serious limitations on their practical applicability. To this end, this paper examines state-of-the-art deep learning techniques to implement a robust and compute-efficient system for yoga pose recognition in real-time on a resource-constrained embedded platform. The first technique uses a hybrid CNN & LSTM model, while the other three (3DCNN Model1, 3DCNN Model2, and 3DCNN Model3) employ the Sports1M pre-trained 3DCNN model named C3D. We assessed the performance of the designed architectures on a publicly available yoga pose database by applying four well-known metrics, namely recognition accuracy, precision, recall, and F1-score. On three database splits, the designed hybrid CNN & LSTM, 3DCNN Model1, 3DCNN Model2, and 3DCNN Model3 achieved mean recognition accuracy of 98.80%, 99.07% 98.19%, and 98.43%, respectively. Also, on one of the splits, the best-performing model achieved the highest recognition accuracy of 99.65% and, thus, surpassed the baseline accuracy of 99.38%. Also, the optimal model runs at a frame rate of 31 FPS on an Nvidia GPU-enabled desktop, much better than the previous best of 3 FPS. Finally, to evaluate the model’s efficiency on embedded systems, we optimized it using TensorRT SDK and deployed it on an Nvidia Xavier embedded platform. The optimized model runs at 8 FPS on the resource-constrained embedded platform, demonstrating its suitability for real-world applications. A working demo of the developed system is available at https://youtu.be/at1GJ8Nxx38, and the source codes are available at https://github.com/sumeetssaurav/Yoga-Pose-Classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 1
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Algorithm 2
Fig. 14
Fig. 15
Fig. 16
Algorithm 3
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Algorithm 4

Similar content being viewed by others

Data Availability

All dataset used in the study are freely available.

Code Availability

Custom code.

References

  1. Alp Güler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 7297–7306. https://doi.org/10.1109/CVPR.2018.00762

  2. Ashraf FB, Islam MU, Kabir MR et al (2023) Yonet: A neural network for yoga pose classification. SN Comput Sci 4(2):198

    Article  PubMed  PubMed Central  Google Scholar 

  3. Bai L, Efstratiou C, Ang CS (2016) Wesport: utilising wrist-band sensing to detect player activities in basketball games. In: 2016 IEEE international conference on pervasive computing and communication workshops (PerCom Workshops), IEEE, pp 1–6. https://doi.org/10.1109/PERCOMW.2016.7457167

  4. Cao Z, Simon T, Wei SE, et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299. https://doi.org/10.1109/CVPR.2017.143

  5. Chen C, Wang G, Peng C et al (2019) Improved robust video saliency detection based on long-term spatial-temporal information. IEEE Trans Image Process 29:1090–1100

    Article  ADS  MathSciNet  Google Scholar 

  6. Chen C, Wang G, Peng C et al (2021) Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans Image Process 30:3995–4007

    Article  ADS  PubMed  Google Scholar 

  7. Chen HT, He YZ, Chou CL, et al (2013) Computer-assisted self-training system for sports exercise using kinects. In: 2013 IEEE international conference on multimedia and expo workshops (ICMEW), IEEE, pp 1–4. https://doi.org/10.1109/ICMEW.2013.6618307

  8. Chen HT, He YZ, Hsu CC, et al (2014) Yoga posture recognition for self-training. In: international conference on multimedia modeling, Springer, pp 496–505. https://doi.org/10.1007/978-3-319-04114-8_42

  9. Chen HT, He YZ, Hsu CC (2018) Computer-assisted yoga training system. Multimed Tools Appl 77(18):23969–23991. https://doi.org/10.1007/s11042-018-5721-2

    Article  Google Scholar 

  10. Connaghan D, Kelly P, O’Connor NE, et al (2011) Multi-sensor classification of tennis strokes. In: SENSORS, 2011 IEEE, IEEE, pp 1437–1440. https://doi.org/10.1109/ICSENS.2011.6127084

  11. Dantone M, Gall J, Leistner C, et al (2013) Human pose estimation using body parts dependent joint regressors. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3041–3048. https://doi.org/10.1109/CVPR.2013.391

  12. De Michelis E (2005) A history of modern yoga: patanjali and western esotericism. A &C Black

  13. Desai M, Mewada H (2023) A novel approach for yoga pose estimation based on in-depth analysis of human body joint detection accuracy. PeerJ Comput Sci 9:e1152

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ditty M, Karandikar A, Reed D (2018) Nvidia’s xavier soc. In: hot chips: a symposium on high performance chips

  15. Fang HS, Xie S, Tai YW, et al (2017) Rmpe: regional multi-person pose estimation. In: proceedings of the IEEE international conference on computer vision, pp 2334–2343. https://doi.org/10.1109/ICCV.2017.256

  16. Gaiswinkler L, Unterrainer H (2016) The relationship between yoga involvement, mindfulness and psychological well-being. Complement Ther Med 26:123–127

    Article  CAS  PubMed  Google Scholar 

  17. Gan D, Wang Y, Zhang N et al (2017) (2017) Enhancing short-term probabilistic residential load forecasting with quantile long-short-term memory. J Eng 14:2622–2627

    Article  Google Scholar 

  18. Gao Z, Zhang H, Liu AA et al (2016) Human action recognition on depth dataset. Neural Comput Appl 27(7):2047–2054. https://doi.org/10.1007/s00521-015-2002-0

    Article  Google Scholar 

  19. Garg S, Saxena A, Gupta R (2022) Yoga pose classification: a cnn and mediapipe inspired deep learning approach for real-world application. Journal of ambient intelligence and humanized computing pp 1–12

  20. Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 6645–6649

  21. Greff K, Srivastava RK, Koutník J et al (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924

    Article  MathSciNet  PubMed  Google Scholar 

  22. Guddeti RR, Dang G, Williams MA et al (2019) Role of yoga in cardiac disease and rehabilitation. J Cardiopulm Rehabil Prev 39(3):146–152. https://doi.org/10.1097/hcr.0000000000000372

    Article  PubMed  Google Scholar 

  23. Gupta A, Gupta HP (2021) Yogahelp: Leveraging motion sensors for learning correct execution of yoga with feedback. IEEE Trans Artif Intell 2(4):362–371

    Article  Google Scholar 

  24. Hsieh CC, Wu BS, Lee CC (2011) A distance computer vision assisted yoga learning system. J Comput 6(11):2382–2388. https://doi.org/10.4304/jcp.6.11.2382-2388

    Article  Google Scholar 

  25. Huang Z, Liu Y, Fang Y, et al (2018) Video-based fall detection for seniors with human pose estimation. In: 2018 4th international conference on universal village (UV), IEEE, pp 1–4

  26. Jain S, Rustagi A, Saurav S et al (2021) Three-dimensional cnn-inspired deep learning architecture for yoga pose recognition in the real-world environment. Neural Comput Appl 33:6427–6441

    Article  Google Scholar 

  27. Joo H, Liu H, Tan L, et al (2015) Panoptic studio: A massively multiview system for social motion capture. In: proceedings of the IEEE international conference on computer vision, pp 3334–3342. https://doi.org/10.1109/ICCV.2015.381

  28. Kelly P, Healy A, Moran K, et al (2010) A virtual coaching environment for improving golf swing technique. In: proceedings of the 2010 ACM workshop on surreal media and virtual cloning, pp 51–56. https://doi.org/10.1145/1878083.1878098

  29. Li J, Zhang D, Shi L et al (2023) An improved high-resolution network-based method for yoga-pose estimation. Appl Sci 13(15):8912

    Article  CAS  Google Scholar 

  30. Li Y, Li S, Chen C et al (2020) A plug-and-play scheme to adapt image saliency deep model for video data. IEEE Trans Circuits Syst Video Technol 31(6):2315–2327

    Article  Google Scholar 

  31. Lim SA, Cheong KJ (2015) Regular yoga practice improves antioxidant status, immune function, and stress hormone releases in young healthy people: a randomized, double-blind, controlled pilot study. J Altern Complement Med 21(9):530–538. https://doi.org/10.1089/acm.2014.0044

    Article  PubMed  Google Scholar 

  32. Liu Y, Stoll C, Gall J, et al (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: CVPR 2011, IEEE, pp 1249–1256. https://doi.org/10.1109/CVPR.2011.5995424

  33. Lu N, Wu Y, Feng L et al (2018) Deep learning for fall detection: Three-dimensional cnn combined with lstm on video kinematic data. IEEE J Biomed Health Inform 23(1):314–323. https://doi.org/10.1109/JBHI.2018.2808281

    Article  PubMed  Google Scholar 

  34. Luo Z, Yang W, Ding ZQ, et al (2011) "left arm up!" interactive yoga training in virtual environment. In: 2011 IEEE virtual reality conference, IEEE, pp 261–262. https://doi.org/10.1109/VR.2011.5759498

  35. Maanijou R, Mirroshandel SA (2019) Introducing an expert system for prediction of soccer player ranking using ensemble learning. Neural Comput Appl 31(12):9157–9174. https://doi.org/10.1007/s00521-019-04036-9

    Article  Google Scholar 

  36. Martinez J, Hossain R, Romero J, et al (2017) A simple yet effective baseline for 3d human pose estimation. In: proceedings of the IEEE international conference on computer vision, pp 2640–2649. https://doi.org/10.1109/ICCV.2017.288

  37. Mohanty A, Ahmed A, Goswami T, et al (2017) Robust pose recognition using deep learning. In: proceedings of international conference on computer vision and image processing, Springer, pp 93–105. https://doi.org/10.1007/978-981-10-2107-7_9

  38. Nordsborg NB, Espinosa HG, Thiel DV (2014) Estimating energy expenditure during front crawl swimming using accelerometers. Procedia Eng 72:132–137. https://doi.org/10.1016/j.proeng.2014.06.024

    Article  CAS  Google Scholar 

  39. Okonta NR (2012) Does yoga therapy reduce blood pressure in patients with hypertension?: an integrative review. Holist Nurs Pract 26(3):137–141

    Article  PubMed  Google Scholar 

  40. Palanimeera J, Ponmozhi K (2023) Yoga posture recognition by learning spatial-temporal feature with deep learning techniques. International journal of image and graphics p 2450055

  41. Pascoe MC, Thompson DR, Ski CF (2017) Yoga, mindfulness-based stress reduction and stress-related physiological measures: A meta-analysis. Psychoneuroendocrinology 86:152–168

    Article  PubMed  Google Scholar 

  42. Patil S, Pawar A, Peshave A, et al (2011) Yoga tutor visualization and analysis using surf algorithm. In: 2011 IEEE Control and System Graduate Research Colloquium, IEEE, pp 43–46, 10.1109/ICSGRC.2011.5991827

  43. Prathikanti S, Rivera R, Cochran A et al (2017) Treating major depression with yoga: A prospective, randomized, controlled pilot trial. PLoS ONE. https://doi.org/10.1371/journal.pone.0173869

    Article  PubMed  PubMed Central  Google Scholar 

  44. Przednowek K, Wiktorowicz K, Krzeszowski T et al (2019) A web-oriented expert system for planning hurdles race training programmes. Neural Comput Appl 31(11):7227–7243. https://doi.org/10.1007/s00521-018-3559-1

    Article  Google Scholar 

  45. Qiang B, Zhang S, Zhan Y et al (2019) Improved convolutional pose machines for human pose estimation using image sensor data. Sensors 19(3):718. https://doi.org/10.3390/s19030718

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  46. Rector K, Bennett CL, Kientz JA (2013) Eyes-free yoga: an exergame using depth cameras for blind & low vision exercise. In: proceedings of the 15th international ACM SIGACCESS conference on computers and accessibility, pp 1–8. https://doi.org/10.1145/2513383.2513392

  47. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  48. Sarubin N, Nothdurfter C, Schüle C et al (2014) The influence of hatha yoga as an add-on treatment in major depression on hypothalamic-pituitary-adrenal-axis activity: A randomized trial. J Psychiatr Res 53:76–83

    Article  PubMed  Google Scholar 

  49. Sathyanarayanan G, Vengadavaradan A, Bharadwaj B (2019) Role of yoga and mindfulness in severe mental illnesses: A narrative review. International journal of yoga 12(1):3. https://doi.org/10.4103/2Fijoy.IJOY_65_17

    Article  PubMed  PubMed Central  Google Scholar 

  50. Saurav S, Saini R, Singh S (2021) A dual-stream fused neural network for fall detection in multi-camera and \(360^{\circ }\) videos. Neural computing and applications pp 1–28

  51. Schure MB, Christopher J, Christopher S (2008) Mind-body medicine and the art of self-care: teaching mindfulness to counseling students through yoga, meditation, and qigong. J Couns & Dev 86(1):47–56. https://doi.org/10.1002/j.1556-6678.2008.tb00625.x

    Article  Google Scholar 

  52. Sethi JK, Nagendra H, Ganpat TS (2013) Yoga improves attention and self-esteem in underprivileged girl student. J Educ health Promot 2. https://doi.org/10.4103/2277-9531.119043

  53. Shan CZ, Ming ESL, Rahman HA, et al (2015) Investigation of upper limb movement during badminton smash. In: 2015 10th asian control conference (ASCC), IEEE, pp 1–6. https://doi.org/10.1109/ASCC.2015.7244605

  54. Sharma A, Agrawal Y, Shah Y, et al (2022) Iyogacare: real-time yoga recognition and self-correction for smart healthcare. IEEE Consumer electronics magazine

  55. Shotton J, Fitzgibbon A, Cook M, et al (2011) Real-time human pose recognition in parts from single depth images. In: CVPR 2011, Ieee, pp 1297–1304. https://doi.org/10.1109/CVPR.2011.5995316

  56. Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 doi.org/10.48550

  57. Swain D, Satapathy S, Acharya B et al (2022) Deep learning models for yoga pose monitoring. Algoritm 15(11):403

    Article  Google Scholar 

  58. Tian Y, Zitnick CL, Narasimhan SG (2012) Exploring the spatial hierarchy of mixture models for human pose estimation. In: european conference on computer vision, Springer, pp 256–269 https://doi.org/10.1007/978-3-642-33715-4_19

  59. Tompson JJ, Jain A, LeCun Y, et al (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: advances in neural information processing systems, pp 1799–1807. https://doi.org/10.5555/2968826.2969027

  60. Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214

  61. Tran D, Bourdev L, Fergus R, et al (2015) Learning spatiotemporal features with 3d convolutional networks. In: proceedings of the IEEE international conference on computer vision, pp 4489–4497. https://doi.org/10.1109/ICCV.2015.510

  62. Trejo EW, Yuan P (2018) Recognition of yoga poses through an interactive system with kinect device. In: 2018 2nd international conference on robotics and automation sciences (ICRAS), IEEE, pp 1–5. https://doi.org/10.1109/ICRAS.2018.8443267

  63. Ullah A, Ahmad J, Muhammad K et al (2017) Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE Access 6:1155–1166. https://doi.org/10.1109/ACCESS.2017.2778011

    Article  Google Scholar 

  64. Upadhyay A, Basha NK, Ananthakrishnan B (2023) Deep learning-based yoga posture recognition using the y_pn-mssd model for yoga practitioners. In: healthcare, MDPI, p 609

  65. Vallabhaneni N, Prabhavathy P (2023) Segmentation quality assessment network-based object detection and optimized cnn with transfer learning for yoga pose classification for health care. Soft Computing pp 1–23

  66. Verma M, Kumawat S, Nakashima Y, et al (2020) Yoga-82: a new dataset for fine-grained classification of human poses. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 1038–1039

  67. Waldron M, Twist C, Highton J et al (2011) Movement and physiological match demands of elite rugby league using portable global positioning systems. J Sports Sci 29(11):1223–1230. https://doi.org/10.1080/02640414.2011.587445

    Article  PubMed  Google Scholar 

  68. Wang C, Wang Y, Lin Z, et al (2014) Robust estimation of 3d human poses from a single image. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2361–2368. https://doi.org/10.1109/CVPR.2014.303

  69. Wang J, Yu LC, Lai KR, et al (2016) Dimensional sentiment analysis using a regional cnn-lstm model. In: proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: short papers), pp 225–230. https://doi.org/10.18653/v1/P162037

  70. Wang L, Xu Y, Cheng J et al (2018) Human action recognition by learning spatio-temporal features with deep neural networks. IEEE access 6:17913–17922. https://doi.org/10.1109/ACCESS.2018.2817253

    Article  Google Scholar 

  71. Wei G, Zhou H, Zhang L et al (2023) Spatial-temporal self-attention enhanced graph convolutional networks for fitness yoga action recognition. Sensors 23(10):4741

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  72. Wu W, Yin W, Guo F (2010) Learning and self-instruction expert system for yoga. In: 2010 2nd international workshop on intelligent systems and applications, IEEE, pp 1–4. https://doi.org/10.1109/IWISA.2010.5473592

  73. Wu Y, Lin Q, Yang M, et al (2021) A computer vision-based yoga pose grading approach using contrastive skeleton feature representations. In: healthcare, MDPI, p 36

  74. Wu Z, Zhang J, Chen K et al (2019) Yoga posture recognition and quantitative evaluation with wearable sensors based on two-stage classifier and prior bayesian network. Sensors 19(23):5129. https://doi.org/10.3390/s19235129

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  75. Xie S, Girshick R, Dollár P, et al (2017) Aggregated residual transformations for deep neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500. https://doi.org/10.1109/CVPR.2017.634

  76. Yadav SK, Singh A, Gupta A et al (2019) Real-time yoga recognition using deep learning. Neural Comput Appl 31(12):9349–9361. https://doi.org/10.1007/s00521-019-04232-7

    Article  Google Scholar 

  77. Yadav SK, Agarwal A, Kumar A et al (2022) Yognet: A two-stream network for realtime multiperson yoga action recognition and posture correction. Knowl-Based Syst 250:109097

    Article  Google Scholar 

  78. Yahya U, Senanayake SA, Naim A (2018) A database-driven neural computing framework for classification of vertical jump patterns of healthy female netballers using 3d kinematics–emg features. Neural Computing and Applications, pp 1–20. https://doi.org/10.1007/s00521-018-3653-4

  79. Zhang L, Zhu G, Shen P, et al (2017) Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: proceedings of the IEEE international conference on computer vision workshops, pp 3120–3128

  80. Zhang L, Zhu G, Mei L, et al (2018) Attention in convolutional lstm for gesture recognition. In: proceedings of the 32nd international conference on neural information processing systems, pp 1957–1966

Download references

Acknowledgements

The authors would like to acknowledge the support of Director, CSIR-CEERI, Pilani for providing the necessary infrastructure. The authors would also like to thank Yadav et al. for making their Yoga pose dataset publicly available.

Funding

No external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumeet Saurav.

Ethics declarations

Conflicts of interest

The authors declare that they have no competing interests that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saurav, S., Gidde, P. & Singh, S. Exploration of deep learning architectures for real-time yoga pose recognition. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18694-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18694-y

Keywords

Navigation