1 Introduction

Yoga, which originates from India, is a form of healthy exercise that encourages harmony between the body (physical) and mind (mental). It is one of the oldest sciences in the world that is known to be effective for sustaining and preserving physical and mental health, and spiritual evolution [1]. Yoga practicing provides the benefits of improved flexibility, energy levels, sleep, posture, muscle strength, circulatory and cardio health, athletic performance, and reduction of injuries as well as chronic pain. Yoga is effective in treating psychiatric disorders and is more effective than therapeutic exercise for chronic lower back pain [2, 3]. The survey respondents (89.4%) perceived that yoga aids the relief of joint pain, muscle pain, headaches, and depression [4]. However, improper yoga positions have been shown to cause neck, waist, shoulder, wrist, knee, sprains, and muscle pain. Specifically, a survey of 34 countries in 2007 found that practicing yoga incorrectly generally causes neck injuries [5]. Moreover, of the 46.6% injuries resulting from poor practice, injuries to the lower limbs (21.9%), and sprains (45.03%) were observed from a survey of yoga-related injury cases (2001–2014) [6]. The most common type of injury was found to be a sprain (23/67, 34%) with the lower extremity (27/67, 42%), in one Canadian study (1991–2010) [7].

Several studies have recently been actively conducted to combine artificial intelligence (AI) [8, 9] and yoga training to prevent such injuries. In previous studies, eye-free yoga was studied using Microsoft Kinect as a yoga instructor [10], and computer-assisted vision of a yoga learning system was studied in [11]. AI-based yoga posture estimation was proposed by Girija et al. [12], who focused on a survey study of posture estimation that was applied to an Android application. They presented an Android application graphical user interface (GUI) that took the user video and predicted the posture angle, using the PoseNet algorithm for comparison with the model training database. The resulting feedback given was based on the calculated angle. Moreover, real-time activity was proposed, through assistance being embedded into the yoga mat, by Chinnaaiah et al. [13]. This approach can identify pressure nodes through the use of force sensitive resistor (FSR) sensors utilized. The pressure data collected are utilized with an algorithm for pattern detection and biofeedback of posture correction of a subject is provided in real-time.

The reminder of this manuscript is organized as follows. Section 2 is the related work and Sect. 3 is the material and methods of this study. The experiment result is provided in Sect. 4. The whole work is discussed in Sect. 5 and concluded in Sect. 6.

2 Related work

Several companies have developed various products associated with technology in the sports and exercise domain. Wearable X company [14] proposed a wearable product called NADI X-Smart Yoga Pants, which could guide exercise form via a mobile application. Another company, SmartMat [15], established an intelligent yoga mat with an advanced sensor to detect the pressure node of posture on the mat and provide real-time feedback on how the user has performed the yoga posture, i.e., correctly or incorrectly, via a mobile application. YogaNotch [16] came up with a yoga wearable device assistant, placed on the body, which provides audio feedback on alignments when the user practices yoga at home. One more popular exercise guidance product in this current era is MIRROR [17], which is a movable mirror for use in a home gym that enables exercises to be viewed together with the user’s reflection.

However, yoga posture recognition via a mobile application [12] is complicated, with narrow monitoring for users to achieve the correct feedback. In [13], the proposed yoga mat with a pressure sensor was inadequate for correcting whole-body posture. Furthermore, the other products developed in [14,15,16] are insufficient for exercising with reflection. In addition, the product that came with a reflection monitor, i.e., MIRROR [17], still has issues as it provides a camera without any body-posture detection or correction. Additionally, MIRROR is expensive to use, including product prices and monthly training class subscriptions, which could be a barrier to many.

Yoga classification plays a significant role in many studies using different approaches, such as posture estimation using the OpenPose algorithm [18] and posture recognition by machine learning (ML) and deep learning (DL) techniques. Palanimeera et al. [19] presented yoga posture classification using ML for multiple people to detect real-time posture by using posture estimation with a 3D posture from an RGB camera. In [19], they considered 12 yoga postures, from the sun salutation asana, captured by a webcam camera. The method created a skeleton, applied featured extraction, and classified the sun salutation yoga postures through ML models as well as support vector machine (SVM), k-nearest neighbors (kNN), naïve Bayes, and logistic regression with a resulting of 96%.

Kumar et al. [20] proposed yoga posture estimation by using keypoint detection method, called the OpenPose algorithm, on the public yoga asanas dataset [21]. The method was collected in a video frame of six yoga asanas, including cobra posture, lotus posture or sitting posture, corpse posture, mountain posture, triangle posture, and tree posture.

Nagalakshmi et al. [22] applied the learning of 3D landmark points, from a single image, using skinned multi-person linear (SMPL) and an encoded architecture of classification models. These models included kNN, SVM, and other deep learning models such as AlexNet, VggNet, and ResNet. Each model was utilized with the proposed dataset that was based on a collection of 13 different yoga postures. The dataset includes a half-camel posture, standing half-forward bend posture, butterfly posture, cobra posture, bridge posture, sitting posture, standing forward bend posture, child posture, corpse posture, mountain posture, tree posture, triangle posture, and twisted posture. Classification was evaluated with an accuracy of 83%.

Chen et al. [23] introduced a computer-assisted self-training system, using Kinect, with a user’s body contour and star skeleton extraction for asana recognition, which took three yoga postures: tree posture, warrior 3 posture, and downward-facing dog posture. They achieved an overall accuracy of only 82.84% during their experiments. The previous research provided a self-training system that estimated the yoga posture by extracting the visual features from image data to analyze the 12 yoga postures, including tree posture, warrior 3 posture, warrior 2 posture, warrior 1 posture, downward-facing dog posture, extended hand-to-big-toe posture, chair posture, full boat posture, cobra posture, side plank posture, plank posture, and lord of the dance posture, to correct yoga posture performance [24]. They captured the user body map to apply the topological skeleton to extract the feature points of the practitioner’s body and visualized the posture instructions to adjust the correct posture. Posture recognition was evaluated by a yoga expert frame by frame, achieving 93.45% accuracy.

Manisha et al. [25] proposed a new dataset, called Yoga-82, for the fine-grained classification of human yoga postures. They modified the DenseNet201 model to classify yoga postures within their proposed Yoga-82 dataset and achieved the top-1 score of image classification performance at 79% of 82 multi-classes.

The purpose of this study was to develop a yoga posture coaching system based on transfer learning. We collected yoga posture data for 14 yoga posture classifications and proposed a yoga posture coaching system, using an interactive display, based on transfer learning that pre-trained weights were from the CNN architecture model to recognize a yoga posture in real time. Furthermore, our yoga posture coaching system provided posture instruction feedback when a user performed yoga in front of a yoga posture coaching system.

3 Materials and methods

3.1 Data acquisition

There is limited access to the dataset available for yoga recognition, so we collected the yoga posture images from eight volunteers (6 women and 2 men), some of whom had never experienced yoga before performing each of the 14 yoga postures 10 times. The 14 different yoga postures included bridge posture, cat-cow posture, child posture, cobra posture, corpse posture, downward-facing dog posture, sitting posture, extended side angle posture, warrior II posture, and warrior I posture, as shown in Fig. 1. Yoga datasets were captured using an HD 1080p Logitech c920 pro webcam with an image resolution of 720\(\times \)960. The total amount of collected data was 1,120 images, which were divided into 60 % for the training set, 20% for the validation set, and 20 % for the testing set for yoga posture classification.

Fig. 1
figure 1

Sample of 14 different yoga postures

3.2 Data augmentation

Data augmentation (DA) is used to boost the model performance and solve overfitting by increasing the quantity of the data training set [26]. According to the large size of the training set required for deep learning model training, DA is used in this study. We applied the DA, including shear augmentation in the range of 0.2, zoom at 0.2, width shift and height shift in the range of 0.12, brightness at range [1, 1], and horizontal flip. Figure 2 shows an example of each augmentation parameter.

Fig. 2
figure 2

Examples of data augmentation in this study. a Original, b Horizontal flip, c Shear, d Zoom, e Height shift, f Width shift, and (g) Brightness

3.3 Transfer learning for yoga posture classification

3.3.1 Feature extraction

Transfer learning (TL) is the learning technique based on previous knowledge, to enable transfer to another domain for classification and feature extraction purposes [27]. TL trains the pre-trained model from a general CNN architecture, with a large number of datasets, to predict with a new small target dataset in the related task. Generally, the CNN model consists of feature extraction and classification layer, in which a feature extraction layer is used to train with large data (ImageNet dataset) and fine-tune the classification layer to predict the target dataset, using a transfer learning technique.

In this study, we used six different pre-trained models, including VGG16, VGG19, MobileNet, MobileNetV2, InceptionV3, and DenseNet201 [28,29,30,31,32], with separately trained weights, using the ImageNet dataset [33], to extract our yoga posture features. The training process begins with the weights of each model-based and freezes the top layers of the pre-trained model to tune the model for our yoga posture dataset. Figure 3 shows the transfer learning technique process flow used in this study.

Fig. 3
figure 3

Transfer learning process from pre-trained on ImageNet dataset to learn new target datasets

3.3.2 Deep neural network classifier

After extracting features from the pre-trained model, the classification layer of each model was modified to the new fully connected layer to turn old features into predictions of our 14 categories of yoga posture. Twenty percent of the dropout was applied between the fully connected layers, and the normalized output utilized the softmax function for multi-class classification. Moreover, the network compiled in this study used the categorical cross-entropy loss function and Adam optimizer, with 100 epochs, in the training process. To prevent the overfitting of classification training, the validation loss value is tracked to break the training process, when it is not reduced, by using early stopping with a min delta value of 0.001 at patience 10.

3.3.3 Model training setting

In this experimental setup, we utilized the training model in the Python programming language with the main libraries TensorFlow, Keras, OpenCV, Numpy, PIL, and Scikit-Learn. The entire yoga images were resized to a resolution of 224\(\times \)224, for each of the three channels, to ensure that they were suitable for use in model training. The operating system, adopted for model training of yoga posture classification, featured a dual-core set up: 2\(\times \) Intel Xeon Silver 4114 CPUs @ 2.20 GHz, 12\(\times \)16 GB DIMM DDR4 Synchronous RAMs @ 2400 MHz, 3\(\times \)512 GB Samsung 970 NVMe M.2 SSDs, and 3\(\times \)NVDIA TITAN RTX GPUs 24 GB GDDR6 @ 1770 MHz-4608 Compute Unified Device Architecture (CUDA) cores.

3.4 Performance matrix

The performance matrix is used to evaluate the classification, based on four confusion matrix categories: true positive (TP), false positive (FP), true negative (TN), and false negative (FN), by following a multi-class classification table based on [34]. Overall accuracy (OA) presented the overall percentage of classification, where all samples were correctly predicted. The sensitivity identified the ability to classify samples as true positive, and specificity present the classifying as true negative [35]. The Matthews correlation coefficient (MCC) represented a correlation between what was observed and the classification, where an MCC value close to +1 indicates to perfect prediction, −1 reflects disagreement between the prediction and reality and 0 means random classification [36, 37]. To evaluate the yoga posture classification performance, we measured OA, sensitivity, specificity, and MCC, based on the confusion matrix, using Eqs. 123, and 4, respectively, where n is the total number of yoga postures.

$$\begin{aligned} OA= & {} \frac{TP_{1} + TP_{2} + ... + TP_{n}}{Total Testing Set} \times 100 \end{aligned}$$
(1)
$$\begin{aligned} Sensitivity= & {} \frac{TP}{{TP} +FN}\times 100 \end{aligned}$$
(2)
$$\begin{aligned} Specificity= & {} \frac{TN}{TN + FP}\times 100 \end{aligned}$$
(3)
$$\begin{aligned} MCC= & {} \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FN)(FP + TN)(TP +FP)(TN + FN)}} \end{aligned}$$
(4)

3.5 Yoga self-coaching system

3.5.1 Coaching system feedback

For instructional feedback, the keypoint of the human body was estimated using the Mediapipe algorithm [38]. We extracted body landmarks of each of the keypoints to enable angle calculation of each body joint, e.g., left/right shoulder, left/right elbow, left/right wrist, left/right hip, left/right knee, and left/right ankle. In order to provide instructions for incorrect posture, each joint angle was calculated from three points using Eq. 5.

$$\begin{aligned} \theta = \frac{180 \times (atan2((c_{y} - b_{y}), (c_{x} - b_{x})) - atan2((a_{y} - b_{y}), (a_{x} - b_{x})))}{\pi } \end{aligned}$$
(5)

Where c, b, and a is the first, middle, and last point of the three-point, respectively. And it used to identify the angle at this point.

In the yoga coaching system process, yoga posture recognition was predicted based on 14 different yoga postures. When the user performs an incorrect posture following the selected yoga image guide, the joint angle is calculated based on keypoints estimated through utilizing the angle-checking condition. Figure 4 shows the process flow of the real-time yoga coaching system used in this study.

Fig. 4
figure 4

Real-time coaching feedback flowchart of yoga self-coaching system

3.5.2 Yoga self-coaching prototype

In this study, we implemented a yoga self-coaching system, connected to an RGB camera. We proposed embedding the optimal yoga posture classification model, investigated in this study, using the Jetson Nano Development Kit for yoga self-exercise at home. The Jetson Nano Development Kit is a microprocessor produced for developing and learning in AI research [39]. We also established a yoga self-coaching mirror based on a Jetson Nano processor connected to a USB camera, called Logitech c920 pro webcam, to recognize the yoga posture in real-time. Moreover, the required distance for using the yoga self-coaching system is 2.5 meters.

4 Results

4.1 Comparison between models in yoga posture classification

After investigating the transfer learning models for yoga posture classification, Table 1 shows the comparison of results between the six transfer learning models used in this study. The transfer learning model based on pre-training of MobileNet with DA achieved an overall accuracy of 98.43%, where the TL-VGG16-DA, TL-VGG19-DA, TL-MobileNetV2-DA, TL-IceptionV3-DA, and TL-DenseNet201-DA performances were 94.90%, 94.90%, 92.16%, 91.76%, and 98.04%, respectively. The sensitivity of TL-MobileNet-DA was the highest (98.30%), while the specificity was 99.88%. The correlation of the TL-MobileNet-DA model classification achieves an MCC value of 0.9831 close to +1, which means that this model achieves accurate predictions of yoga posture classification. However, the TL-InceptionV3-DA model achieved poor classification of our yoga posture dataset, with an overall accuracy of 91.76%, the sensitivity of 91.07%, the specificity of 99.36%, and MCC value of 0.91. According to the classification results from Table 1, transfer learning on MobileNet with DA was recommended for use in the yoga self-coaching system.

Table 1 Comparison of competitive models in this study

Table 2 summarizes the classification performance of sensitivity, specificity, and MCC for each posture from the TL-MobileNet-DA model. Each performance matrix of the TL-MobileNet-DA model was evaluated based on the confusion matrix, as shown in Fig. 5. It provided the highest sensitivity, at 100%, in relation to bridge posture, cat-cow posture, child posture, downward-facing dog posture, sitting posture, extended side angle posture, seated forward bend posture, tree posture, triangle posture, and warrior II posture, while the lowest sensitivity, 93.33%, was achieved for plank posture. Moreover, the highest specificity of 100% was achieved for bridge posture, cobra posture, corpse posture, downward-facing dog posture, sitting posture, extended side angle posture, plank posture, triangle posture, warrior II posture, and warrior I posture, while the lowest specificity, 99.56%, was found for tree posture. The perfect prediction of each posture obtains an MCC value of 1 and this was achieved for bridge posture, downward-facing dog posture, sitting posture, extended side angle posture, triangle posture, and warrior II posture. Figure 6 shows the normalized confusion for yoga posture prediction using the TL-MobileNet-DA model. For the normalized confusion matrix performance, perfect predictions were achieved for 10 postures out of the 14 yoga postures included in this study. These included bridge posture, cat-cow posture, child posture, downward-facing dog posture, sitting posture, extended side angle posture, seated forward bend posture, tree posture, triangle posture, and warrior II posture.

Table 2 Classification performance for each posture from the TL-MobileNet-DA model
Fig. 5
figure 5

Confusion matrix of the testing set from the TL-MobileNet-DA model

Fig. 6
figure 6

Normalized confusion matrix from prediction using the TL-MobileNet-DA model

4.2 Development of yoga self-coaching system

The yoga self-coaching system has been completely developed in the Python programming language, with a Python interface based on the Tkinter package. Tkinter is the standard Python interface for the Tk graphical user interface (GUI) toolkit [40]. Figure 7 shows the user interface of the yoga self-coaching system by capturing the user’s posture from a USB camera at the top of the mirror panel and then recognizing the yoga posture, based on the TL-MobileNet-DA model embedded in the Jetson Nano kit. A user can choose the specific yoga posture of 14 different postures studied in this paper to follow the yoga practice. When a user confirms their selected posture for evaluation, the guide yoga image is displayed on the system interface. The user is then able to check for the correct posture by viewing the guidance image, demonstrated in real-time, to achieve posture recognition. The resulting recognized posture of the user will pop up with a success message of “Correct :)” when the user achieves in the right posture and illustrated the fail message of “Incorrect :(”when the user has not achieved the correct posture. The recognized yoga label from our system is shown below the performance checking results, as shown in Fig. 7. Furthermore, the system will provide instructional feedback to a user, to provide guidance enabling them to perform the posture correctly, based on the angle of each joint, calculated from keypoints extracted when a user performs the incorrect yoga posture.

Fig. 7
figure 7

Snapshots of the yoga self-coaching system

5 Discussion

In the literature review, several studies have proposed an algorithm and applied many techniques to recognize yoga postures based on ML or DL [19,20,21,22,23,24,25]. According to the small amount of yoga posture data available on the Internet, other researchers collected their yoga posture data. Therefore, in our experiment, we collected the yoga posture dataset in our indoor laboratory with an RGB camera at 14 different yoga postures. In order to recognize the yoga posture, we used a different approach from previous studies and applied the transfer learning with a general CNN architecture to pre-train for feature extraction and modified the classifier layer to predict our yoga posture dataset.

The results indicated that transfer learning on MobileNet worked well with DA on our yoga posture dataset by enabling a better overall accuracy of prediction than achieved by other competitive models in this study. The classification evaluated is the truthful score with the coefficient value of the MCC value, which is close to +1. Although the experiment of the TL-MobileNet-DA model showed a slight inaccuracy in predicting the plank posture, the recognition model is still acceptable for embedding within a yoga self-coaching system.

Due to inaccessibility to the same yoga posture dataset, direct comparison with the literature is impossible. The work in [23] provided a self-training system for posture recognition of three postures: tree, warrior III, and downward-facing dog, while our consideration of yoga postures was 14 asanas and these were collected with a normal RGB webcam. The five participants in the data collection performed each of the three yoga postures 5 times, while in our case, eight participants volunteered to perform all 14 yoga postures 10 times. The overall accuracy evaluated by their method was 82.84%. Moreover, Chen et al. [24] used at the skeleton with body contour to extract the feature points of twelve yoga postures, collected from the Kinect sensor, and achieved an accuracy of 93.45%. However, prior studies used manual feature extraction and made a separate model for each yoga posture, which is time-consuming to implement when appending new postures. Incorporating new classes of yoga postures when adopting our approach is possible through adding more neurons in the last dense layer and re-training on the new dataset. Our method achieved an accuracy of 98.43%, which is superior to that of Chen et al. On the other hand, our work comprises a coaching system, called yoga self-coaching system, with real-time instructional feedback, based on the calculated angle of body joints using Mediapipe keypoints extraction.

Reviewing the products inspired by several companies related to yoga assistance, the Yoga Pants, SmartMat, and YogaNotch [14,15,16] do not enable the viewing of the users’ reflection while doing the exercises; the MIRROR [17] does enable the viewing of one’s reflection but does not provide any posture recognition. Our proposed a yoga self-coaching system plays a significant role in solving these problems.

Nonetheless, our study established a yoga system that recognized 14 different yoga postures for beginner yoga practice. The different postures provide different health benefits. Thus, more yoga postures need to be investigated in the future to improve our proposed system.

6 Conclusions

This paper presented a yoga self-coaching system based on transfer learning techniques. The first step of this study was to collect the yoga posture dataset using a normal RGB webcam and then to apply data augmentation techniques. The transfer learning technique, pre-trained on the MobileNet model, was investigated. In the last phase, we established an AI yoga system utilizing a prediction model for real-time inference.

In conclusion, the yoga posture classification method achieved a performance accuracy of 98.43%, which was accomplished by embedding this within our yoga self-coaching system. The yoga self-coaching system was developed to recognize the yoga postures following the selected yoga posture guide and to output the predicted result and give real-time guidance for incorrect postures. The identification of incorrect posture is based on the calculated angle of the joints achieved through keypoint estimation using the Mediapipe algorithm.

In summary, we developed a yoga self-coaching system that can predict yoga posture and confirm instruction feedback in real time. Since the start of Covid-19, home training has increased and our developed system supports this in our opinion. The yoga self-coaching system is used to recognize the correct yoga posture and provide instruction in real time.