1 Introduction

Recognition of human body gestures is an emerging proliferated domain of research which can be applies for human–computer interaction. Gestures can be defined as the powerful and natural ways of verbal as well as non-verbal communications where body activities are used to convey meaningful messages [1]. Recognition of gestures means identification of body expressions of human motion; such as fingers, arms, hands, head and body movement [1]. Gesture recognition is an inter-disciplinary research area. Dance gestures recognition is one of the sub-domain of gesture recognition. Dance gesture recognition refers to recognize meaningful human expressions as a medium for dance and drama communication. Hand movements are the first, independent and foremost step to learn in any kind of classical dance. The hand movement uses in classical dance form are commonly known as mudras or hastas. The single-hand gestures used in classical dances are known as Asamyukta hastas and double hand gestures known as Samyukta hastas which are derived from Asamyukta hastas. According to Sangeet Natak Academy [2], there are eight officially recognized classical dance forms in India. Among them, the oldest classical dance is Bharatnatyam and the youngest one is Sattriya classical dance. Though the Sattriya classical dance were introduced in the 15th century however, this dance form is recognized as a classical dance in the year 2000. As per the Sangeet Natak Academy, there are twenty eight single-hand gestures used in Bharatnatyam and twenty nine are used in Sattriya classical dance [3] as shown in Fig. 1. In the state of arts, hand gestures may be of two types: dynamic hand gestures and static hand gestures. Dynamic hand gestures are defined as recognition of wavy hands movement and static hand gestures are the recognition of static hand expression. The only algorithm reported in the literature review to classify Sattriya classical dance static hand gestures is two level classification [4] algorithm. This algorithm although gives promising result in the first level but in the second level, performance is not satisfactory. However they require trial and error method to choose the optimized features. Again, another recent work [5] is based on vision based features. Their research work is limited only on five hand gestures (Mudras). Since Asamyukta hastas of Sattriya classical dance are nearly alike to each other and therefore chances of misclassification also very high. So, to address this issues one Multilevel Classification Model with vision based features (MCM-\(V_b\)F) has been proposed to classify the Asamyukta hastas of Sattriya classical dance. This model is working as a tree like structure, for each level the features will be automatically select and narrow down the search space at each level of the hierarchy till the mudra or hasta is completely recognized at a leaf node. This paper also proposes an entropy-based similarity measure that is initiated to measure the correctness of performing an Asamyukta hastas. The similarity measure provides a similarity score between the input Asamyukta hasta image and hasta images performed by experts in the database. The proposed model has been found capable of performing with high accuracy in classifying single-hand gestures of Sattriya classical dance.

Fig. 1
figure 1

Twenty nine Asamykta hastas in Sattriya classical dance

1.1 Research contribution

The highlighted contributions are as follows:

  1. 1.

    The proposed Multilevel Classification Model with Vision based Features (MCM-\(V_b\)F) has been reported to classify twenty nine (29) single-hand gestures of Sattriya classical dance.

  2. 2.

    To enhance this proposed model eight vision based features has been extracted. Details of each features extraction method has been explained. This features has wide application in any kind of gestures recognition.

  3. 3.

    This model will help to reduce over-fitting problem and also reduce the misclassification rate.

  4. 4.

    An information theoretic similarity measure is introduce to support the classifier.

  5. 5.

    Evaluating the proposed MCM-\(V_b\)F model on three benchmark classifier such as Naive bayes, Decision tree and Support Vector Machine (SVM) which gives promising performance for all cases.

  6. 6.

    The proposed model gives better performance for both own generated Sattriya classical dance Single-Hand Gestures (SSHG) dataset as well as Bharatnatyam classical dance Single-Hand Gestures (BSHG) dataset.

1.2 Motivation

This research word has been down for the following motivation:

  • To digitize and preserve Sattriya, one among eight Indian classical dance forms and educing the insufficiency of dataset in this area.

  • Due to the current technological era, this journey of art transmitted very fast within and across countries. This research will help to propagate our cultural heritage by using a tool to enhance through the countries.

  • This research will help to build bridge between arts and technologies.

  • Self learning and e-learning are the main motivation behind this work.

1.3 Issues and challenges in this domain

The main challenges found in this domain is the scarcity of dataset and similarity between the hand gestures which provides misclassification. Moreover, as per the state-of arts, most of the research work on dance hand gestures recognition has been carried out in the Bharatnatyam [2, 6,7,8], and Odissi dance forms [9, 10]. To the best of our knowledge, there is no significant work has been done on Sattriya classical dance. Our previous research work reported for recognition of hand gestures of Sattriya classical dance was two level classification algorithm [4]. This algorithm was used to recognize the twenty nine single-hand gestures of Sattriya classical dance. Though in the first level promising accuracy is obtained but in the next level there are lots of misclassification occur. On the other-hand, others popular algorithm such as Scale Invariant Feature Transform (SIFT), Histogram of Oriented Gradient (HOG), K-Nearest Neighbors (KNN) are used for static hand gestures recognition. Though their performance of accuracy is very high for any hand gestures recognition. However, these algorithm gives high rate of misclassification for Sattriya classical dance single-hand gesture recognition. In addition, these above mentioned algorithms were worked with trial and error method for features selection. On the other-hand, algorithms based on deep learning, different neural networks viz., Convolutional Neural Network (CNN) and Artificial Neural Network (ANN) uses automatic feature extraction methods where hidden layer are used for feature selection. Therefore, to resolve these above issues the Multilevel Classification Model with Vision based Features (MCM-\(V_b\)F) has been proposed. This proposed model works as hierarchical tree like structure. For each level, the features will be automatically select with minimum search space. In the next level, the search space will become narrow down until the hand gestures of Sattriya classical dance known as hastas will individually recognised in the leaf node. Again, from the state of arts it has been came to know that the work done on Bharatnatyam classical dance [6] dataset is not publicly available. In addition in their paper, it was mentioned out of the 28 hand gestures they correctly classified 24 hand gestures. Out of 28 hand gestures, 15 hand gestures are exactly similar for all classical dance. Also, the method reported in [6] for single-hand gestures of Bharatnatyam dance misclassified 4 hand gestures in their own dataset. Among these 4 misclassified hand gestures, 3 are correctly classified by our proposed model. In this paper, the proposed MCM-\(V_b\)F model will basically focus on twenty nine single-hand gestures of Sattriya classical dance. This proposed model is also verified by tested on Bharatnatyam classical dance Single-Hand Gestures (BSHG) dataset

1.4 Problem statement

The main contribution of this paper is to propose a Multilevel Classification Model with Vision based Features (MCM-\(V_b\)F) for single-hand gesture recognition of Indian classical dance. This proposed model is primarily implement on Sattriya classical dance single-hand gestures. However, it is also verified on Bharatnatyam classical dance single-hand gestures. This model provides systematic hierarchical solution for recognition of each individual hand gestures of Sattriya classical dance. Each hand gestures considered as an individual class. Here, instead of searching twenty nine hand gestures linearly for each input hand gestures, it searches over a smaller group based on a feature level matching. This paper also proposes a set of seven vision-based structural features that represent the shape of hand gesture images. Although, these features are applied for recognition of single-hand gestures of Sattriya classical dance, the application of these features extendable in similar applications such as smart home automation, traffic control etc. Additionally, this paper proposes an entropy-based similarity measure to verify the correctness of performing single-hand gestures during dance performance. Initially, the entropy of all classes were calculated where a class comprises of hand gestures images performed by experts. Then, similarity of an input hand gestures images with the class is computed by considering the change of entropy value after adding the input hasta image to that class. The overall algorithm is tested on the Medial Axis Transformation (MAT) image dataset.

The rest of the paper is organized as follows. A brief survey of related works are presented in Sect. 2. Section 3 presents the proposed method with eight new vision-based structural features. Sect. 3 also presents the proposed classification algorithm for recognition of single-hand gestures of Sattriya classical dance with similarity score computation. Experimental results of the proposed algorithm on Sattriya and Bharatnatyam classical dance are presented in Sect. 4. Finally, Sect. 5 concludes the paper with comparison study for future research directions.

2 Literature survey

The dance gesture recognition research is a very young field of research in computer science. It has gained huge attention within a short duration among the research community across the globe [2, 6,7,8,9,10]. Efron [11] was the first man worked on gesture analysis in psychology. As per the state of arts, the available gesture recognition research on dance domain are mostly done on following dance form:

  • Bharatnatyam [2, 6,7,8] is the oldest classical dance, they analyzed the dance gesture recognition on the basis of two level decision making system. They worked only with single-hand gestures dataset.

  • In [9, 10], the authors worked on Odissi classical dance. They worked on whole body gestures using sensor technology. They used kinetic sensor for image acquisition purpose and their worked carried out only with eleven co-ordinates out of twenty different skeleton joint.

  • Another worked is on Bali Traditional Dance [12], they used probabilistic grammar-based classifier for their dance gestures classification. Their work is also sensor based and recognize full body gestures recognition.

  • In [2], author works on the popular Ballet Dance. They work with sensor based technology where multilevel classifier system was to classify different posture of dances.

  • In literature another popular community based dance Kazakh Traditional Dance [13] has been reported. In their dance, the head movement plays major role. So they mainly focused on head gestures.

  • Dance gestures recognition from video [14], used very small dataset. They worked on only 800 image dataset and eight classes. Objectives of their work was to recognize Indian classical dance form.

  • Another work of hand gesture recognition system using machine learning [15] has been reported, they used only four types of single-hand gestures.

  • Recent work [5] on classical dance hand gestures recognition gives better accuracy. However, their work is also limited to five hand gestures only.

In the literature, as the best of my knowledge the oldest algorithm for understanding human body gestures segmentation is hierarchical activity segmentation [16]. They worked with different types of physical segmentation of human body parts (Anatomy) It represents all the physical segmentation of human anatomy such as 22 segments of human body and each of them has independent gestures. In their technology, they works on the basis of inheritance property. The lower level segments acquire the properties of higher level segments. Their main challenges were to consider the gestures variation from the all angle of view. However their algorithm was limited on finite number of states. Again, another work by a group of researchers on gesture recognition of Bharatnatyam mudra had been reported in [6]. It is the most popular dance form of India. Almost all the 70 % classical dancer is very much familiar with this dance. This group developed a repository module to classify the twenty eight Asamyukta mudras. Their objective to promote this dance globally and in their paper they claimed only 24 hand gestures were correctly classified. In [12], author focused on Bali traditional dance. Their aim is to develop a powerful recognizer based on a method motivated by linguistic.

The authors of a work reported in [9] had proposed a system or tools for classical dance of India. However, their system was developed using different types of sensors. Final output of their research work was one device which automatically generates the skeleton from the gestures from where twenty different junction and co-ordinates in the form of 3d images were obtained. Their system was very challenging and they also work with emotion detection. They extracted the features to detect the mood such as anger, fear, happiness, sadness and relaxation.

In [17], the author discusses how classical dance and their interpretation can be analyzed using deep learning techniques. In this paper, the authors developed a deep learning-based computational framework that relied on deciphering and understanding the emotions, expressions, and intentions of dancers. As revealed, the model has been successfully able to classify different stances and dance movements that clearly indicate the intent of the dancer. The approach hereby presented opens perspectives toward challenging applications in the field of digital heritage preservation and technological enhancement of cultural performances.

The paper [18] describes a deep learning-based framework for analysis and interpretation of Indian Classical Dance. The focus of the study is majorly on recognizing and classifying the various dance movements and other complicated gestures. The findings clearly reveal the potential of the presented model in identifying various forms and postures of dances with high accuracy, depicting the power of deep learning in making a stride toward deciphering and preserving the rich cultural heritage of Indian Classical Dance. This work falls within the juncture of technology and traditional arts, which contributes new methods of documenting and analyzing dance performances.

The paper [19] describes the identification and interpretation of the emotions of Indian classical dance through deep learning. It develops a model for the recognition of nine principal emotions ’Rasas’ as expressed through traditional dances. It has been found that the model can categorically recognize the development of those emotions based on the dancers’ facial expressions, gestures, and movements. This work provides evidence of how deep learning can be of greater service in enhancing the analysis and preservation of emotional expressions of Indian classical dance, thereby helping in two major areas: studies related to cultural heritage and artificial intelligence.

In [20], the author presents a system to monitor and analyze the activities of students during online lectures with the help of computer vision and IoT technologies. The study has presented a framework that can track student engagement levels, attention, and participation through video feeds and sensor data analysis. From the findings, it has been shown that the system can effectively detect and quantify different activities on the part of students, such as attentive or interactional, which may provide insight into the behavior of the students during an online class. This will help in improvising online education by giving real-time feedback on the engagement created by students.

In [7], the authors proposed a methodology for classifying different mudras of classical dance with emotional description. They used different types of image processing and pattern recognition algorithm in their research work. Their work was applicable for any kind of Indian classical dance hand gestures. They used machine learning techniques such as training the system and tested with unknown dataset. For segmentation of the main object from the background they used hybrid Saliency algorithm. They worked on both Asamyukta as well as Samyukta hastas (Mudras). They used bench mark classifier viz., k-nearest neighbour algorithm for classification. Entire methodology of their work was based on vision based mechanism.

In [13], research work had done on Kazakh traditional dance gesture. Their main focus of research was on head movements. Like [9], they used sensor based technology to detect the skeleton and collect the details information of each skeleton like joints, angel between the joints, distance between different joints of the body. They used both clustering and classification techniques like Bayesian network and k- means clustering with expectation maximization respectively. The also used most popular Hidden Markov Model (HMM) algorithm to find the different postures of the body. Hidden Markov Model (HMM) is a popular bench marks classifier. Naïve Bayes classifier and Decision Tree are the other two bench marks classifiers which have been get more popularity for their simplicity. The most widely used classification algorithm is Support Vector Machine (SVM) which is a unique supervised classifier algorithm based on decision boundary known as hyperplane. Here, each data item is plotted in n-dimensional space, where n is considered as no of features and each coordinate includes the value of each feature. The main task which is performed in this algorithm is to draw a line or a hyperplane for distinctly identifying data points. The coordinates of individual observation is called Support Vector. With the help of hyperplane Support Vectors accurately classified on the basis of selected data features. However, the limitation of the SVM is that it works only on labeled data.

The second bench marks classifier is Naïve Bayes classifier. It is a simple classifier based on Bayes’ theorem, so it is also known as “Probabilistic classifier”. It can firmly generate the assumed self-determination among the present features. Although this classifier is very simple to execute for some basic purposes, however it is also used to get estimate result with higher accuracy using kernel density estimation algorithm. Basically, this algorithm is collection of several sub algorithms. It is applicable when all sub algorithm works independently from each other to classify the features. As per the literature survey, the fundamental tasks for this algorithm are text classification, image classification etc however, the limitation of these algorithm is that it requires high dimensional data set.

The third bench marks classifier is Decision Tree supervised learning algorithm. This algorithm uses to represent the graphical solution of any problem based on given conditions. A decision Tree is constructed with internal nodes, which represent the dataset of features, branches those represents the rules for decision making and leaf nodes which generate the output. In any Decision Tree algorithm it contains two nodes: viz., decision nodes and leaf nodes. Decision nodes are useful in decision making with their relevant branches but the leaf nodes are only responsible for the final outcome. Leaf nodes are situated far from the branches of the tree. Although use of Decision Tree is noticeable in classification and regression both sorts of problems, but they are most suitable for classification problem only. For any Decision Tree based algorithm based on the questions binary answer i.e. Yes/No, the condition moves further from start node or root node to leaf node

Features plays very important role to recognize any kind of dance gestures. Without extracting good features, it is not possible to recognize any dance gesture correctly. For example, the hand gestures of any kind of classical dance which has very similar pattern. So, to extract most discriminant features is a very challenging task. In addition, to extract features from the two dimensional images of 3-dimensional object is very challenging. Because there is higher chances of information loss. Also it will become more challenges when the images has been added with noise and distortion [21]. In the state of arts, features of different types of moment such as hu’s moment, zernike, different shape features can detect the object easily. The features are mainly categorised as global features, local features, contour features [22, 22, 23]. However, these features are not sufficient to classify our Sattriya classical dance single-hand gestures dataset properly, due to its alike pattern of hand gestures. Therefore, it requires new features to classify this dataset. In this paper, we contribute eight vision based features extraction methods. In the next section, the new proposed features with their extraction methods has been discussed briefly.

3 Proposed method

This paper propose a multilevel classification model with vision based features (MCM-\(V_b\)F) to recognize 29 single-hand gesture of Sattriya classical dance. This proposed model is also verified on Bharatnatyam Classical dance single-hand gestures. This propose model provides promising result for both Sattriya classical dance single-hand gesture dataset and Bharatnatyam classical dance single-hand gesture datasets.

Fig. 2
figure 2

Framework of MCM-\(V_b\)F model

The steps involved in the proposed approach as shown in Fig. 2 are preprocessing, Medial Axis Transformation (MAT), feature extraction, classification and finally similarity score computation. Each steps of this framework are briefly explained in the next subsections.

3.1 Preprocessing

The preprocessing phase is very important for any image processing technique as the real times data are generally not perfect and not consistent. To make normalised and noise free image the following steps are performed in the preprocessing phase:

  • Step1: Crop image and size normalization: In the first step, the captured RGB images has croped, the palm portion has been considered and ignore the others portion. Then, resize the rgb images into 200 × 200 pixels.

  • Step2: Background subtraction: Using Gaussian Mixture Model (GMM), the background of the images has subtracted.

  • Step3: RGB image to binary conversion: Using library function rgb2gray, the rgb image has converted to gray image. Then, using auto threshold value and convert the gray image into binary.

  • step4: Boundary image conversion: To remove noise from the binary image apply Gaussian filter with size 25 × 25 and then using MATLAB library function ’bwboundary’ boundary of the image has extracted.

Fig. 3
figure 3

Steps in preprocessing phase with Ardhasuchi hasta

An example of preprocessing phase with Ardhasuchi hasta (a single-hand gesture of Sattriya classical dance is shown in Fig. 3.

3.2 Medial axis transformation (MAT)

After preprocessing phase, from the boundary image the closest boundary points for each point has been extracted which is known as Medial Axis Transformation and finally gives the skeletal of the object. The algorithm to compute MAT image is shown in Algorithm 1.

Algorithm 1
figure a

Algorithm to convert RGB image to MAT image

An example of medial axis transformation phase with one hand gestures viz., Sashaka hasta is shown in Fig. 4.

Fig. 4
figure 4

Steps in mdial axis transformation phase for Sashaka hasta

3.3 Feature extraction

In this section, eight vision-based structural features have been proposed to recognize Asamyukta hastas (single-hand gestures) of Sattriya classical dance. The vision-based features are the features that can represent the shape of an object (hand) in an image which can also be visualized by normal eye [24]. This type of features are invariant with respect to any transformation. The proposed vision-based features are categorized into high level features and low level features [25]. To classify the all the 29 hand gestures into sub groups used the high-level features. Whereas, the low level features are used to recognize the individual single-hand gestures within the sub groups. The high level features includes (i) Euler number, (ii) Angle between fingers (iii) fingers_tips distance, (iv) Edge length and the low level features includes, (v) Number of edges from cycle, (vi) Number of edges from border, (vii) Edges from border merge to a cycle, and (viii) number of branch points. All the above mentioned features are applied on skeletal images that was extracted in MAT phase as discussed in section 3.2. The methods to extract these features are briefly discussed in supplementary section 7.

3.4 Proposed multilevel classification model (MCM-\(V_b\)F)

This proposed multilevel classification model with vision based features (MCM-\(V_b\)F) is a supervised learning model. It is used for recognition of single hand gestures of Sattriya classical dance. This model works like a hierarchical tree structure, where internal node represent the features, branches represent the decision and each leaf represent the output i.e., recognition of hasta (mudra). This model is working on the basis of vision based features which is the mimic of human thinking. Therefore, it very easily understandable model. In this paper, this model is apply on the twenty nine single-hand gestures (Asamyukta hastas) of Sattriya classical dance. Here, each hand gestures is consider as a single class. The proposed MCM-\(V_b\)F model can be explained in the form of tree structure as shown in Fig. 5.

Fig. 5
figure 5

Conceptual diagram of MCM-\(V_b\)F model

In this figure, internal nodes of the tree represent the group of single-hand gestures of Sattriya classical dance and a leaf node of the tree represent individual hasta. The features used in this approach are categorized into high level features and low level features. The high level features viz., Euler Number (EN), Edge Length (EL), finger_Tips Distance (TD) and Angle between Fingers (AF) are used for group recognition. And, the low-level features viz., Number of Edge from Cycle (EC), Number of Edge from Border (EB), Edge from Border Merge to a Cycle (EBMC) and Number of Branch Point (BP) used for individual hasta recognition within the hasta group. The high level features are used at internal nodes to classify a hasta image into a hasta group while low level features are used at the leaves of hierarchy to recognize the hasta image within the hasta groups represented by the predecessor nodes. The working procedure of the proposed MCM-\(V_b\)F model in the form of algorithm is as follows 2.

Algorithm 2
figure b

Algorithm for MCM-\(V_b\)F model

The algorithm automatically selects features to classify the 29 hastas into group and individual. In this algorithm, input is the RGB image and output is recognized individual hasta. The input image is converted to Medial Axis Transformation (MAT) image using function ConvertRGBtoMAT presents in Algorithm 1. Next, the converted MAT image is used for hasta group recognition using high level feature matching as discuss in the next subsection:

3.4.1 High level feature matching

Output of this high level feature matching step is hasta group recognition. The first group of hastas are recognized based on Euler number (EN) feature. i.e., all the 29 hastas are categorized as no-hole, single-hole or multiple-hole group as:

  • if \(\hbox {EN}>0\), no-hole group

  • if \(\hbox {EN}=0\), single-hole group

  • if \(\hbox {EN}<0\), multiple-hole group

Then, the no-hole group hastas are further classified as open-finger group or close-finger group based on Edge Length (EL) feature. If the average EL of input hasta is greater than a threshold (selected from the observed value) EL, then EL is considered as long edge and categorized into open-finger group otherwise close-finger group. Again, the open-finger group of hastas is classified as either as wide-gap between the fingers or no gap among the fingers depends on features ’angle between fingers’ (AF) and \(finger\_tips\) distance (TD). If the angle between adjacent finger other than the thumb finger is greater than 45\(^\circ \) and \(finger\_tips\) distance is greater than threshold (experimentally determined from a range of possible value) then the input hasta is categorized into wide-open group otherwise close-finger group. In this way, the twenty nine (29) hastas are categorized into five (5) groups based on high level features. In the next subsection, we will discuss about how the individual hastas are recognized using low level features.

3.4.2 Low level feature matching

After the input hasta is classified into one of hasta group as described in the previous subsection, the low-level features are used to recognize the individual hasta. The recognition of individual hasta within group as follows:

  1. 1.

    Single-hole group: The hastas within single-hole group are recognized using features: number of edge from cycle (EC), number of edge from border (EB), edges from border merge to a cycle (EBMC) and edge length (EL). The feature values of single-hole group hastas are shown in Table 1.

  2. 2.

    Multi-hole group: The individual hasta inside multi-hole are quite easy. To count the number of hole present in the hasta, it is possible to categorized the Granika hasta and Mustika hasta.

    • if EN\(=\)-1, Granika hasta

    • if EN\(=\)-2, Mustika hasta

  3. 3.

    Fingers not fully open Group (Close-finger group): The hastas within Close-finger group are recognized using the features number of branch points (BP) and number of edges from border (EB). The values of these features of close-finger group hastas are shown in Table 2.

  4. 4.

    Fingers fully open Group: The Group within fingers fully open group are further sub classified based on angle between the finger i.e., whether gap exists among the fingers or not. The two sub groups are Wide-Gap group and Slight-Gap group.

    1. 4.1.

      Wide-Gap group: The hastas within wide-gap group are recognized by counting number of fingers. A simple Algorithm is used to count the number of open fingers from the upper half of the MAT images

      Since, these hastas are fully open, so it can be possible to horizontally divide the hastas into two half. Then ignore the lower part and count the number of straight fingers. Example of these group hastas are Ardhasuchi, Shasaka, Tantrimukha and Chatura as shown in Table 3.

    2. 4.2.

      Slight-Gap group: The recognition of hastas of slight-gap group is almost same as wide-open hastas recognition, however the hasta images to MAT images conversion are different, To recognize the closely gap group a different threshold is used to make the fingers more distinguishable. To do this experiment, the suitable value \(1.9 \times graythreshold\) has been taken which is found from the experiment. E.g., kartarimukha, Sangdangsha, Pataka etc. Output steps of conversion from gray to MAT for hasta of wide-open and slight open groups are shown in Figs. 6 and 7 respectively.

Table 1 Features values of hastas having single-hole group
Table 2 Feature value of hand gestures having close-finger group
Table 3 Feature value of hand gestures having wide-gap group
Fig. 6
figure 6

Steps of conversion from gray image to MAT image for wide-open hasta group

Fig. 7
figure 7

Output steps of conversion from gray to MAT image for slight-open hasta group

The algorithm attempts to narrow down the search space at each level of the hierarchy till a hasta is completely recognized at a leaf node. In this way, the twenty nine hastas can be recognized using Multilevel classification model.

3.4.3 Complexity analysis

The complexity analysis of this algorithm are follows:

  1. 1.

    RGB image to MAT image conversion needs a linear time complexity i.e., O(n).

  2. 2.

    To matching features value in if and else condition, it requires to compare some constant values i.e., complexity is O(1).

  3. 3.

    SingleHoleGroup(image) needs a complexity of O(\(N^2\)), where N is the number of raw and column consider a square matrix of the image.

  4. 4.

    If multiplehole present then constant amount of time required. So, complexity is O(1).

  5. 5.

    WideGapGroup(image) involves complexity O(N), where N is the number of column in an image.

  6. 6.

    SlightGapGroup() involves same complexity as previous function WideOpenGroup(image).

  7. 7.

    CloseFingerGroup(image) involves complexity O(\(N^2\)), where N is the number of raw and column consider a square matrix of the image.

Among these steps, since large amount of time is needed if fingers are not fully open or single hole present. In that case we need to check the whole image i.e., complexity O(\(N^2\)) dominates. Hence, the worst case complexity of the proposed method is O(\(N^2\)).

3.5 Similarity score computation phase

A methods for computation of similarity score to measure the correctness of a hasta performed by a dancer. The method uses Shanon’s entropy measure to compare a hasta with a database of hastas performed by an expert. Initially, the entropies of 29 hasta classes are computed where a hasta class consists of hasta images performed by experts of Sattriya classical dance. In the step2, the input hastas (feature represents) are classified by considering entropy-based similarity measure. For each input instance, it computes the new entropy for each class by including the instance. So, it gives a set (cardinality) of new entropy values. Next, we compare these values with the older values. And, we assign the instance to that class for which the entropy variation is minimum. we summarize the steps as follows:

  1. 1.

    For each class of training instances

    • Compute entropy using Shannon’s entropy measure.

    • Store entropy values in an array, Arrayold[]

  2. 2.

    For each input test instance

    • Include in a class of training instances.

    • Compute entropy for the class.

    • Repeat for all other classes and store entropy values in Arraynew[].

    • Compare entries for each class between Array old and Array new and store their value in Array diff.

    • Calculate similarity score =(1-Array diff)*100

The entropy for the class is given by

$$\begin{aligned} {Entropy}\big ({C}_{i}\big )= \frac{1}{L}\sum \limits _{r=1}^{L}{Entropy}\big ({C}_{i}^{r}\big ) \end{aligned}$$
(1)

where L is the number of samples for each class.

4 Results and discussion

The performance of the proposed multilevel classification model with vision based features (MCM-\(V_b\)F) on Sattriya classical dance Single-Hand Gestures (SSHG) dataset and Bharatnatyam classical dance Single-Hand Gestures (BSHG) dataset are measured in terms of Confusion matrix and receiver operating characteristic (ROC) curve. The Confusion matrix represents the prediction results of a classifier with respect to true value. And, the ROC curve represents the ratio of the true positive (sensitivity) against the False Positive rate. Here, true Positive is the true class and false positive images are the wrongly classified class.

4.1 Dataset description

After the pandemic situation, everything become digitized. During that situation, it was not possible to go the classroom physically and learn dance. Also, the dance teachers are not available throughout the region. Basically the Sattriya classical dance is taught in the Satras. So there is very crisis for the dance teachers (Guru) of Sattriya classical dance throughout the world. The main reason behind developing this dataset is to reduce the scarcity of digitization of dance. The Sattriya classical dance Single-Hand Gestures (SSHG) dataset is used to evaluate the proposed MCM-\(V_b\)F model. The number of instances of the SSHG dataset includes total of 44,950 (43,500 noisy images+1450 original images) [26] images. The twenty-nine Asamyukta hastas of Sattriya classical dance are performed as described and approved by several famous Granthas (Epic) of Indian classical dance form viz., Natya Sastra, Abhinya Darpan (the Mirror of the gestures), Sangeet Ratnakar and Srihasta Muktaboli are as follows:

  • 20 hastas namely Pataka, Padmokosha, Mustika, Hangshamukha (Hangsasya), Alpadma, Tripataka, Karatarimukha, Ardhachandra, Sarpashira, Sandangsha, Suchimukha, Urnanava, Mukula, Chatura, Tamrachuda, Kopittha, Bhramara, Khatkhamukha, Sashak (Mrigasirsha) are similar in all of these Granthas [6].

  • 3 hastas namely Ardhasuchi, Singhamukha, Trishula are from Abhinaya Darpan [3].

  • 4 hastas Ankusha, Tantrimukha, Granika, Krishnasarmukha from Sri-hasta Muktaboli.

  • 2 hastas Dhanu and Ban are from Kalikapuran and Abhinaya Darpana.

    The twenty-nine Asamyukta hastas of Sattriya classical dance is shown in Fig. 1. The hasta names and corresponding serial numeric label are given in Table 4.

4.1.1 Description of SSHG dataset

The own generated Sattriya classical dance Single-Hand Gestures (SSHG) dataset contains 44,950 images, where the count of 1450 original images. Additional instances of the hasta images of the dataset are generated by image distortion methods. The image distortion methods used are the addition of noise and blurring. Image noise is the random variation of brightness or color information. Here, for each original image, 30 instances of images are created by adding noise. With this addition of instances, the number of hasta image in the dataset become 44,950 (43,500 noise images+1450 original images) images. The images were collected from three different Satrasviz, Nikamul, Kamalabari and Auniati of Assam. The hasta names and corresponding serial numeric labels are given in Table 4.

Table 4 Numeric label of Sattriya classical dance Asamyukta hastas

4.1.2 Description of BSHG dataset

This Bharatnatyam classical dance Single-Hand Gestures (BSHG) dataset consists of 280 images for 28 Mudras i.e., for each mudra 10 images. The images of Bharatnatyam mudra were captured in same environment where SSHG dance Mudras collected discussed in the previous subsection. The Mudras names and corresponding serial numeric label are given in Table 5.

Table 5 Numeric label of Bharatnatyam mudras

The other few datasets available in recent literature are available in [19, 26, 27]

4.2 Experimental results

This subsection presents the performance evaluations of multilevel classification model (MCM-\(V_b\)F) on Sattriya classical dance Single-Hand Gestures (SSHG) dataset [26] and Bharatnatyam classical dance Single-Hand Gestures (BSHG) dataset. The SSHG dataset is also compare with three popular bench marks classifier such as Naive Bayes, decision tree and Support Vector Classifier (SVM).

4.2.1 Performance evaluation on SSHG dataset using proposed MCM-\(V_b\) model

For this experiment, the images of SSHG dataset [26] were separated into training and testing data. In this experiment, 75% (34,464 images) of image data were used as training data and remaining 25% (11,486 image) datas used as testing data. The accuracy table of classification of Sattriya classical dance hasta using proposed model is shown in Table 6.

Table 6 Classification result of Sattriya classical dance hasta using (MCM-\(V_b\)F) model

The confusion matrices in Fig. 8 for single-hand gestures of “Fingers not fully open” classes are represented to calculate individual precision and recall scores for the Hasta classes. The MCM-VbF model exhibits the best overall performance for recognizing the single-hand gestures of the Sattriya classical dance, with fewer misclassifications and higher recognition accuracy compared to Decision Tree, SVM, and Naive Bayes models. The Naive Bayes model shows the highest confusion, while SVM and Decision Tree provide a balance of accuracy and misclassification across gestures.

Fig. 8
figure 8

Confusion matrices of four classifiers for single-hand gestures of “Fingers not fully open” class

To show the performance evaluation of classification models: MCM-VbF model, Naive Bayes, Decision Tree, and SVM on SSHG dataset, the receiver operating characteristic (ROC) curve is represented as shown in Fig. 9. From this graph, it can be observed that the MCM-VbF model classifier shows the strongest performance, with a nearly perfect curve that stays close to the top-left corner, indicating high sensitivity and a low false positive rate. The Naive Bayes model also performs well, though with some minor fluctuations, reflecting a slightly lower but still strong classification ability. The SVM model shows good initial performance but does not maintain as high a true positive rate as Naive Bayes and MCM-VbF throughout. The Decision Tree’s ROC curve indicates moderate performance, with more variability and a higher false positive rate compared to the other models. Overall, the Naive Bayes and MCM-VbF models exhibit the best predictive capabilities in this classification task, while the SVM and Decision Tree lag behind.

Fig. 9
figure 9

Comparison ROC curve for four classifier

The ROC Curve of some individual classes are also shown in corresponding Fig. 10. This individual ROC curve represent the accuracy of classification within each group oh hastas.

The classification accuracy of class 6-Bhomora i.e., single hole hasta group has the highest at 98.52% accuracy and class 8-Dhanu i.e., Fingers not fully open hasta group showed maximum classification which is shown by the least accuracy of 87.97%. The overall classification accuracy of this MCM-\(V_b\)F model is shown to be 94%.

Fig. 10
figure 10

ROC curve for individual hasta of each group

4.2.2 Performance evaluation on BSHG dataset using proposed MCM-\(V_b\) model

The sample images for BSHG dataset is shown in Fig. 11. For this experiment, there are 280 images were collected, 10 images for each mudra. These collected images were divided into training set and testing set. For training purpose 210 images considered and for testing purpose 70 images considered. To evaluate the result true positive (tp), true negative (tn), false positive (fp) and false negative (fn) values are used. The term true positive terms defined the images which are correctly classified and false positive defined unexpected classification. Applying the precision formula i.e., \(pres=tp/(tp+fp)\) on this dataset it has been shown that our (MCM-\(V_b\)F) can be classified with 87% accuracy for BSHG dataset. That means out of 28 mudras it can correctly classified 26 mudras. The accuracy table of classification of Bharatnatyam mudra using proposed algorithm is shown in Table 7.

Fig. 11
figure 11

Twenty eight mudras in Bharatnatyam classical dance

Table 7 Classification result of Bharatnatyam mudra using (MCM-\(V_b\)F) model

4.2.3 Similarity score computation

After classification step, the entropy of all the individual classes for SSHG dataset are calculated for continuing research work in the next phase for finding the similarity score of each individual hasta. The snapshot of Matlab results of Dhanu hasta of SSHG dataset is shown in Fig. 12.

Fig. 12
figure 12

Example of Dhanu Hasta Recognition with Similarity Score

5 Conclusion and future direction

An effective multilevel classification model with vision based features (MCM-\(V_b\)F) has proposed to classify hand gestures of Classical dance. This proposed model provides special focus on Sattriya classical dance hand gestures of Assam. An information theoretic similarity measure has also been reported to support the classification process. This proposed model has been found capable of classifying all the hand gestures group (hasta groups) of Sattriya classical dance and individual hand gestures within this group with high accuracy. This model has also been validated on Bharatnatyam classical dance Single-Hand Gestures (BSHG) dataset. Before classification proceed, eight vision-based structural features were extracted from the Sattriya classical dance Single-Hand Gestures (SSHG) dataset. For each extracted features, separate description has given. Application of the extracted features are not only limited for SSHG dataset and BSHG dataset, they also have wide variety of applications such as recognition of sign languages, traffic signal or any kind of hand gestures recognition. Future research will focus on extending the dataset and mapped into cloud and trying to improve recognition accuracy up to 100%.