1 Introduction

Yoga, a complex form of exercise, has its roots in ancient spirituality and dates back more than 5000 years. It is renowned for its multiple physical, mental, and spiritual advantages. By putting oneself through postures that have improved in quality over the past few decades, it strengthens one’s body and mind. Yoga must be practised under the guidance of an expert because wrong posture can result in health issues including muscle pulls, ankle sprains, stiff necks, etc. A trainer must be there to oversee the session and improve the person’s posture because of this.

Correct yoga instruction has been made more widely available through yoga education and self-teaching platforms. Computer programmes that use camera vision to analyse practitioners’ posture can help them become aware of the right approaches.

One can only practise yoga alone after receiving sufficient training. More and more people these days are inclined to practise yoga at home. Since not all clients approach or have access to trainers, a computerised reasoning based programme may be utilised to recognise yoga poses and offer personalised advice to help people improve their structure. Mobile devices and software for virtual coaching can be used for this. In order to help users improve their posture, we’ve suggested an artificial intelligence-based application that can recognise yoga stances and answer with a tailored response.

Estimate of human posture is a significant topic in computer vision, having applications in assisted living [1], intelligent driver assistance systems [2], behaviour analysis [3], and visual surveillance [4]. Pose estimation has seen a significant performance gain since the advent of deep neural networks.

Computer vision is used to standardize and correct yoga positions. Incorrect yoga poses can cause serious injuries and long-term complications Narayanan [5] .Analyzing human posture can identify and correct abnormal positions, improving well-being at home [6].

A posture estimator extracts yoga asana properties from properly represented photos. These collected characteristics are immediately fed to neural network and machine learning models. These models assess and predict yoga position correctness [7].

The contributions of this paper are as follows:

  1. 1.

    New Four ensemble-based Yoga pose estimation models have been introduced that boost the performance of the individual three DL techniques: Xception, VGGNet, and SqueezeNet.

  2. 2.

    A core new residual convolutional neural network combines the DL techniques to make three models.

  3. 3.

    The core model is integrated with LDA and GDA features together or separately.

  4. 4.

    LGDeep, GDeep, LDeep, and Deep are the proposed novel models that have been examined on a publicly accessible dataset, exceeding earlier state-of-the-art approaches.

  5. 5.

    The LGDeep surpassed other previous approaches designed for the same purpose since the ensembling technique aided in the acquisition of complementary information, allowing individual models to outperform.

The remainder of this paper includes six sections. Section 2 presents Literature Survey. Preliminaries to Sect. 3 which shows the techniques used. Section 4: Proposed Method, which outlines in detail how the current study’s methodology’s function; Sect. 5: Performance Analysis, which is concerned with the results of the offered methods on the dataset, and Sect. 6: Conclusions, it wraps up the research and contributions made in this study.

2 Literature survey

Our ability to analyse vast volumes of data in a scalable way is made possible by the exciting field of deep learning. Deep learning identifies complicated patterns in data and extracts features on its own, unlike typical machine learning models, which depend on feature engineering or extraction.

Using deep learning, more especially CNN, Kothari [8] showed a computer technique for yoga poses classification from photographs. A dataset of 1000 pictures was used and divided into six classes for the classification model. This work was finished accurately to over 85%. With the help of domain knowledge of five yoga postures, Chaudhari et al. [9] developed a method that gives the practitioner clear feedback so they may correctly practise yoga postures. Yoga poses were identified using a CNN model, while posture flaws were discovered using a human-joint localization model.

With the development of image processing and identification in 2D and 3D space, the field of study on human pose estimation has expanded dramatically [10, 11]. An detailed survey of image-based, monocular human position estimation using Deep Learning based methods has been presented by Chen [10]. They pointed out that there are now a few posture estimate methods available, including body-free and based on a human body model method, body joint point mapping, pixel level analysis, and heatmap mapping. The justification for human posture estimation using deep learning is supported by CNN’s capacity to produce cutting edge outcomes when AlexNet established its value by instigating an image classification task revolution that is still going strong today [12].

Faisal [13] expanded on Chen’s body-joint estimation study [10] and mentioned the pervious art, where the body-joint locations were identified by various researchers employing gyroscope and joint angle methods for determining joint point angle, as well as multiple sensor fusion.

Nagalakshmi examined the effects of yoga and discovered that remote yoga activities have become significantly more popular since the COVID epidemic in addition to treating musculoskeletal diseases and encouraging a healthy lifestyle [14]. Researchers were able to contribute yoga posture recognition techniques as distinct, very influential studies in human pose estimation after a significant gap in remote yoga pose estimation was revealed [14]. For this goal, Agarwal [15] experimented with various Machine Learning methods. They built a dataset of 5,500 photos for 10 distinct yoga positions to address the problem, after that used the estimated pose technique to instantly extract the skeletal images. For their dataset, they discovered that classifiers using random forests performed better.

Liaqat et al. [16] created a hybrid posture identification approach by combining deep neural network technology with traditional machine learning methods. The weight learned by the deep learning model and the forecast from the conventional model are combined to get the final class prediction. Byeon and his team’s study shows the application of ensemble deep models in difficult domestic settings with a variety of backgrounds [17]. They came to some intriguing conclusions about the applicability of exercise programmes in homes for senior citizens and how exercising in the comfort of one’s own home fits in with the philosophy developed in the post-COVID period.

Nagalakshmi’s work [18] focused specifically on yoga while exploring the use of SVM and boosting for stance detection. In [19] Kossaifi applied the tensor-parameterized method to create highly compressed praiseworthy accuracy for the human posture estimation challenge, addressing the redundant effects of over-parameterization. For the human position estimate job of yoga posture recognition, Trejo et al. [20] recommended interactive yoga system position detection, the model can recognise six people’s postures at once. Their model was trained using the AdaBoost method. Their model identified many yoga poses in real time. It is significant to observe that the idea of combining convolution, in particular depth-separable convolution and along with batch normalisation and pooling, along with additional aspects of the task’s performance metrics, makes the architecture significantly complex. They get accuracy of 94.78%. the system needs to be expanded to find more ways to interact with the user to record performance data.

Santosh Kumar Yadav et al. [21] suggested a Yoga detection approach using a regular RGB camera. They collected the information using a high-definition webcam. They scanned the user with OpenPose and identified important details. The LSTM was used to remember the patterns seen in recent frames, while the temporally distributed CNN layer was utilised to identify patterns between important points in a single frame. Their approach did away with the need for Kinect or any other specialised technology to recognise yoga postures.

Using deep learning, Shrajal [22] proposed an effective method for recognising yoga poses in challenging real-world settings. A 3D CNN architecture was designed and implemented. The developed architecture obtained a 99.39% competitive test recognition accuracy as well as a multifold increase in execution speed. The current scheme’s reliance on OpenPose makes it extremely vulnerable to outside noise and reduces the system’s performance when used in challenging real-world scenarios.

Manisha [23] put out the idea of a finely grained hierarchical pose classification method where the pose estimate is formulated as a classification challenge. they suggest the dataset Yoga-82 for 82 classes of large-scale yoga pose recognition. The collection includes different body postures, body positions, and the names of the actual poses in a three-level hierarchy. In order to make use of the hierarchical labels, the model offers multiple hierarchical DenseNet variations. The best result is provided by DenseNet-169, which achieves the best classification accuracies of 91.44%. The disadvantage is that they should concentrate on introducing explicit limits among anticipated labels for various class levels.

Using the Y-PN-MSSD model, [24] proposed a yoga posture recognition system. Three stages are identified for the model. The first step is data preparation and collection, where the yoga poses are recorded from four users and an open-source dataset of seven poses. The model is then trained using the data acquired, during which features are extracted by connecting important body parts. The model then helps by live tracking the user through yoga positions after the yoga posture has been identified. Using the suggested Y PN-MSSD model, seven yoga postures are identified with a total accuracy of 99.88%. The disadvantage is dealing with additional implementation difficulties like occlusion and lighting changes.

Chen [25] suggests a self-training approach for yoga that tries to teach the practitioner how to do yoga poses correctly, helping to correct bad postures and avoiding injury. The suggested system extracts the body shape, dominant axes, feature points, and skeleton to examine both front and side views of the practitioner’s pose. Front view posture Y1 has a 99.87% accuracy rate. When extracting feature points for specific stances, the machine may become confused.

Yoga may play a role in heart recovery and can be incorporated into cardiovascular rehabilitation, according to Mehar et al. [26]. The study suggests a classification and rating system for the effectiveness of yoga workouts. In order to assess the quality of yoga asanas, quality scores for input movements are generated using the ConvNet and deep belief network models, respectively. Their yoga practice’s precision was 99.99% accurate.

In order to detect five various yoga positions from relatively smaller amounts of data, Ashraf [27] suggested a deep learning-based approach. Their YoNet design separates the depth and spatial information from the image and takes each one into account for classification calculations. According to the experimental findings, it has a precision of 95.61% and accuracy of 95.61%.

To develop a random forest classifier for the assessment of yoga asanas, Mustafa [28] leveraged transfer human learning position estimation algorithms to extract 136 critical spots distributed throughout the body.

They employ in the second stage, a random forest classifier after using the well-known pose estimation technique AlphaPose in the first. They developed a brand-new dataset for classifying yoga poses that focuses on various points of view. Their accuracy rate was 91.21%. They rely on posture estimate and precise keypoint identification.

Meena [29] proposed a CNN model for facial expression sentiment analysis that can handle large data and perform better. A CNN-based facial image sentiment analysis model covers word ambiguity, multi-linguality, granularity, and opinion expression differences across textual genres. They extensively tested the CNN3 model on the CK + and FER-2013 datasets and found that it outperforms state-of-the-art methods with 79% and 95% accuracy, respectively. They used multiple measures to evaluate the proposed deep learning model against Naive Bayes, SVM, logistic regression, random forest, and decision tree.

Meena [30] presented an image-based sentiment analysis system that uses CNN models and Inception-V3 transfer learning to analyze people’s sentiments and solve sentiment analysis challenges. With 99.5% accuracy, their machine learning method outperformed others. They compared the Inception V3-based model to VGG 19 and other transfer learning models. They used CK+, FER2013, and JAFFE datasets to create sentiment analysis information management systems.

Rawat [31] developed a vision-based hand sign recognition for Indian Sign Language interrogative phrases. They suggested a CNN-based recognition and classification approach for isolated dynamic ISL gestures. To address the lack and variance of dynamic gesture datasets of interrogative phrases in ISL, a self-recorded dataset is constructed. With Adam optimizer, their technique was 99.6% accurate. Table 1 is the summarization of existing works.

Table 1 Experimental result obtained of testing data

3 Preliminaries

This section provides background information on the deep learning techniques utilised with LDA and GDA. Following that, the transfer learning theory is covered. Finally, we outline the creation of a deep ensemble.

3.1 Generalized discriminant analysis (GDA)

A nonlinear discriminant analysis that makes use of the kernel function operator is the generalised discriminant analysis. Because of how closely its underlying theory resembles that of support vector machines, the GDA technique aids in the mapping of input vectors into high-dimensional feature space. GDA maximises the ratio of between-class scatters to within-class scatter to discover a projection for variables in a lower-dimensional space.

3.2 Linear discriminant analysis (LDA)

Widely used in machine learning, pattern recognition, and statistics, Fisher’s linear discriminant technique is generalised in LDA. Finding a linear combination of features that can distinguish between two or more classes of objects is the goal of the LDA technique. With LDA, class separability is maximised in the representation of data. While projection allows for the juxtaposition of objects of the same class, it places objects of different classes far apart from one another.

3.3 Deep learning methods

The following three deep learning techniques are employed:

3.3.1 Xception

Xception is a 71-layer deep convolutional neural network design. Depthwise sparable convolutions are used. Researchers from Google created it. According to Google, Inception modules in convolutional neural networks serve as a transitional stage between the depth wise separable convolution operation and ordinary convolution [32].

3.3.2 VGGNet

Simonyan et al. from the University of Oxford have proposed the convolutional neural network model known as the VGG Network. The top-5 test accuracy for this architecture was 92.7% in ImageNet, which has approximately 14 million photos divided into 1000 classes [33].

3.3.3 SqueezeNet

An 18-layer convolutional neural network is called SqueezeNet. Researchers from DeepScale, UC Berkeley, and Stanford worked in its creation. The inventors of SqueezeNet set out to design a smaller neural network with fewer parameters that would be simpler to store in computer memory and transmit across a computer network [34].

3.4 Yoga postures

Yoga is an excellent technique to get in shape and lose weight, even though its primary goal is to quiet the mind. Here are a few positions that can help you feel less anxious and weigh less. Hold each pose for as long as you can, which may initially be 15–20 s. As you practise, add a few seconds to each hold to eventually reach a minute, if you can. When appropriate, complete one side before completing the other. Yoga postures come in a variety of styles [35]. Downdog, Goddess, Plank, Tree, and Warrior2 are categorised in this work. Figure 1 displays some examples of a yoga posture.

3.4.1 Downdog

The yoga “poster position” is Downward Dog. The fact that it is so crucial to modern practise is the reason it has grown to be the most well-known asana. It can be the first pose you pick up when you start practising yoga. In most yoga courses, it is repeated several times. It serves as a resting stance and a transitional pose.

3.4.2 Goddess

Goddess is a strong, intermediate standing level pose also known as the Fierce Angle Pose. The Goddess Pose can also be included in yoga sequences for the chest, hips, and groin. When done as part of a prenatal yoga routine, goddess pose helps to stretch the uterus and get ready for delivery. Goddess Pose can be used into flow yoga sequences since it boosts the body’s energy.

3.4.3 Plank

One of the most important poses in yoga is the plank, and there are many good reasons for this. Every time you perform this grounding pose, your core muscles and the strength in your arms and wrists will improve. Additionally, it helps to enhance posture by bolstering the muscles that surround the spine. Deep attention and mental fortitude are also strengthened in plank pose. Holding the position even when your arms begin to tremble serves as a potent reminding you that you can face obstacles both on the mat and off it.

3.4.4 Tree

This posture imitates a tree’s ethereal solidity. The Tree Pose is different from most yoga positions in that keeping our eyes open is necessary to maintain balance on our bodies. If you have a migraine, sleeplessness, or low or high blood pressure, stay away from this posture.

3.4.5 Warrior2

A grounding and lunging standing yoga practise is called Warrior 2. It improves physical and mental awareness while enhancing endurance, strength, and power. Some people believe that Warrior2 position is anatomically challenging since it requires balance and breathing work as your legs move in various directions. As a result, it takes a lot of concentration to concentrate on keeping your front leg bent while straightening the back leg. Additionally, you must direct your pelvis so that it is level and pointing forward.

3.5 Dataset description

The dataset used in this research paper is called “Yoga Pose Classification,“ sourced from Kaggle [36]. Image dataset specifically curated to categorize five yoga positions.

Using this dataset for the study article has several benefits. First, yoga positions improve physical and mental health. However, wrongly practicing these positions might cause injuries and long-term issues. Develop accurate and reliable yoga position identification models to give practitioners real-time feedback and adjustments to ensure proper execution and reduce injury risk.

Second, yoga’s growing popularity as a stress-reduction and fitness method makes this dataset important. As more people practice yoga, computerized tools to analyze and guide postures are needed. This dataset serves as a valuable resource for training and evaluating such systems, as it contains diverse and representative images of the five specific yoga poses.

The dataset may affect health and well-being, making it important. This dataset may be used to recognize and correct yoga positions using machine learning methods. This allows practitioners to perform safe and effective yoga, improving fitness, flexibility, and quality of life.

The “Yoga Pose Classification” dataset was chosen for this study work because it might improve yoga practice safety and efficacy.

The Chowdhury et al. produced dataset that was used to evaluate the performance of the suggested methodology is freely available on Kaggle [36]. The dataset, which is represented in Figs. 1 and 2, comprises of 839 images that have been classified into the categories (Downdog, Goddess, Plank, Tree, Warrior2) with classes 0, 1, 2, 3, 4 correspondingly. The proposed framework has 70% of the images as training data, 20% as validation data, and 10% as testing data.

Fig. 1
figure 1

The dataset’s examples of yoga poses [36]

Fig. 2
figure 2

Distribution of classes of images in the Yoga dataset [36]

3.6 Deep ensemble

We attempt to optimise the millions of parameters in deep CNNs using a stochastic gradient descent technique or one of its variants. As a result, despite the optimizer’s best efforts, there are many local minima in the search space. Although the mistake rates on these networks are comparable, the size of their search space causes them to make distinct errors. As a result, their diversity can be utilised by designing assembling strategies [37].

Transfer learning is used to build a given model in part by utilising a pretrained model on a different task. It has the potential to reduce training time and eliminate the requirement for an expensive processing equipment. It is revealed that characteristics are extracted from images using the trained model. Deep learning models utilised recently include Xception, VGGNet, and SqueezeNet.

Figures 3 and 4 show the layers of the core model and the proposed Yoga posture estimation algorithm for Xception, VGGNet, and SqueezeNet. The refined transfer learning models are used to extract the features. It employs the softmax activation function.

4 Proposed methods

The core layer of the model has been shown in Fig. 3. The suggested ensemble deep learning model is shown in Fig. 4. The three deep learning techniques, Xception, VGGNet, and SqueezeNet, are fed into the core model separately. The approaches are trained on ImageNet [38] before being fine-tuned on the Yoga dataset. LDA and GDA models are used in the core model, either separately or together.

Fig. 3
figure 3

The proposed Core Model

Fig. 4
figure 4

The proposed Yoga pose estimator algorithm

Our approach involves training a residual CNN network, which employs skip connections to allow the output of one layer to serve as input for the following layers. This technique addresses the issues of vanishing gradients and accuracy saturation, which can occur when additional layers are added to a deep model [39]. To improve the network’s performance, we made various modifications to its layers. In particular, we followed each convolutional layer with a corresponding ReLU activation layer.

The architecture of the core model is divided into distinct layers, beginning with an input layer. Following this, there are two stacks of convolutional-Relu layers that are connected to a max-pooling layer. Additionally, there is a larger unit containing three convolutional-Relu layers with a max-pooling layer. Subsequently, four convolutional layers are incorporated before concluding with a cluster of three convolutional layers. These layers’ organization is depicted in the core model Fig. 3. The residual network includes two residual blocks that implement jump connections. As a result, these shortcuts function by performing identity mapping, which involves adding their results to the combination of the piled layers.

The four innovative models for estimating Yoga Positions are as follows:

  • Deep: There is no combination of LDA or GDA features with DL approaches. Only ensemble of Xception, VGGNet, and SqueezeNet are supported.

  • LDeep: LDA-Deep method: only the LDA output features is concatenated in core model with one DL method (Xception, VGGNet, or SqueezeNet) at a time then the results are ensembled.

  • GDeep: GDA-Deep method: only the GDA output features is concatenated in core model with one DL method (Xception, VGGNet, or SqueezeNet) at a time then the results are ensembled.

  • LGDeep: LDA-GDA-Deep method: the LDA and GDA output features are concatenated in core model with one DL method (Xception, VGGNet, or SqueezeNet) at a time then the results are ensembled.

We first train the pretrained models individually. Then, the majority voting for Yoga positions determines the final ensemble model conclusion. The ensemble model’s steps are as follows:

  1. (1)

    Yoga dataset of five classes is collected.

  2. (2)

    Divide the dataset into three parts: training 70, validation 20, and testing 10%.

  3. (3)

    The form of the set is.

    $$\left[{F}_{tr}{F}_{val}{F}_{ts}\right]={D}_{s}$$
    (1)

    Here, \({F}_{tr}\) is the training set of Yoga, \({F}_{val}\) is the validation set, \({F}_{ts}\) is the testing set, and \({D}_{s}\) denotes the five-class Yoga dataset.

  1. (4)

    X refers to Xception, Xception-LDA, Xception-GDA, or Xception- LDAGDA, depends on the architecture employed.

  2. (5)

    V refers to VGGNet, VGGNet-LDA, VGGNet-GDA, or VGGNet- LDAGDA.

  3. (6)

    Q refers to SqueezeNet, SqueezeNet-LDA, SqueezeNet-GDA, or SqueezeNet- LDAGDA.

  4. (7)

    The DL models, i.e., X, V, and Q, are applied to the testing dataset (\({F}_{ts}\)) as

    $$\begin{array}{*{20}c} {X_{s} } & = & {T_{l} ~\left( {X,S} \right),} \\ {V_{s} } & = & {T_{l} ~\left( {V,S} \right),} \\ {Q_{s} } & = & {T_{l} ~\left( {Q,S} \right).} \\ \end{array}$$
    (2)

    \({X}_{s}\), \({V}_{s}\), and \({Q}_{s}\) denote the softmax functions of X, V, and Q respectively. S denotes the softmax function, \({T}_{l}\)indicates the deep model transfer learning of X, V, and Q, respectively.

  1. (8)

    A definition of each trained individual deep transfer learning model is.

    $$\begin{array}{ccc}{X}_{s}& =& {P}_{a}\left({X}_{s},{F}_{tr}\right),\\ {V}_{s}& =& {P}_{a}\left({V}_{s},{F}_{tr}\right)\\ {Q}_{s}& =& {P}_{a}\left({Q}_{s},{F}_{tr}\right).\end{array}$$
    (3)

    The model’s architecture process is defined by \({P}_{a}\)

  1. (9)

    The majority vote procedure is utilised to assemble.

    $${E}_{D}={M}_{V}({{X}_{s},V}_{s},{Q}_{s})$$
    (4)

    Diagnostic ensemble model is defined by \({E}_{D}\), while the ensemble model of voting majority is defined by \({M}_{V}\).

The model has 380 × 380 × 3 input image size and used Adam optimizer with 100 epochs, average of 5 folds of ensemble methods, 8 as batch size. It also used 0.001 learning rate and its loss is cross entropy.

5 Performance analysis

Table 2 displays the experimental result obtained in percent of the testing data for average of 5 Folds of ensemble methods. The developed ensemble deep learning model’s Receiver Operating Characteristic Curve analysis is shown in Fig. 5.

Table 2 Summarization of existing works
Fig. 5
figure 5

ROC test of classes

Figure 6 presents the Precision Recall (PR) test for classes 0, 1, 2, 3, and 4. Better PR values were reached with the suggested model, LGDeep. The given methodology can therefore be applied to Yoga classification successfully.

Fig. 6
figure 6

Precision-Recall test of classes

The testing evaluations for the four suggested deep learning models are shown in Fig. 7. In comparison, the best suggested model, LGDeep, gets test results of 100% in terms of recall, precision, and

Fig. 7
figure 7

Comparison of the test dataset measurements percentages

F-measure. According to the analysis, the suggested method beats the current models generally by 0.51, 0.5, 0.51, and 0.51%, respectively, in terms of accuracy, precision, F-measure, and recall.

The experiment analysis demonstrates that the LGDeep model outperforms all other models in Table 2 in terms of performance.

5.1 Compared to current models

Table 3 compares the LGDeep model to other earlier models that make use of yoga poses. The table shows that LGDeep has the highest accuracy.

Table 3 Comparison with previous approaches to Yoga Pose classification

6 Conclusions

The paper presented and compared four innovative yoga pose recognition models. LGDeep is the best yoga posture classification model, using deep transfer learning and ensemble techniques. LGDeep’s method achieved 100% classification accuracy, exceeding previous similar studies and models. LGDeep’s specificity and sensitivity exceed those of other techniques, proving its usefulness. The LGDeep model’s dependability and accuracy make it highly suitable candidate for a yoga position recognition system. The recommended technique might improve yoga practitioners’ health and safety due to its strong classification capabilities.

LGDeep’s application goes beyond yoga position recognition. Its concepts and methods may be adjusted to solve many classification issues in other fields. This mobility makes the recommended technique suitable for fitness tracking, physical therapy, sports performance analysis, and other domains.

Future research and development include may include dataset expansion by extending the dataset to incorporate more postures, variants, and body types which can improve the model’s generalizability and robustness.

To increase accessibility, the Yoga posture recognition system might include a user-friendly interface. Users may simply interact with the system, visualize their positions, and track their progress over time.