1 Introduction

Happenings in school constitute a usual cultural activity that takes place inside or outside the borders of school, aiming to extrovert the school units to their culture, while it enchases the aesthetic cultivation, the intellectual development, the creative expression, as well as the sensitization of the pupils to topics related to values, nation, and culture. They are separated into numerous form groups (e.g. exhibitions, music events, theatrical performances, dance, literary events, etc.), in which various related or non-related to educational factors, are involved. The most common event is the theatrical performance, since it is considered crucial for students’ expression, through cultural activities. It is a complicated form of creation, cultural action, communication, as well as artistic expression, targeting artistic as well as pedagogical goal.

Theatrical performance in school constitutes a usual cultural activity that takes place inside or outside the borders of school, aiming to extrovert the school units to their culture, while it enchases the aesthetic cultivation, the intellectual development, the creative expression, as well as the sensitization of the pupils to topics related to values, nation, and culture. It is a complicated form of creation, cultural action, communication, as well as artistic expression, targeting artistic as well as pedagogical goal. As an action, in terms of education, the theatrical performance encourages and at the same time it helps the connection and creative communication of schoolchildren with their nation and culture. In this fashion, the school’s purpose on helping the development of young characters and the role of the educator, are valued, and novel viewpoints are given in the teaching process. Moreover, to prepare a school theatrical event, conditions are mandatory that are not always clear, nor acknowledged.

Theatrical performance conductions are anything but easy, as social or individual issues usually act as inhibiting factors to the person’s attempts to accomplish possible personal goals (Nordin-Bates, 2012). The performance anxiety, is one of those factors and it refers to the feelings that a person could experience during participating in theatrical events, since they presume his/her social exposure, by their nature (Barkley, 2012; Brennan, 2020; Wilson & Roland, 2002; Wright, 1999). Even though it is a common belief that stress may have an impact in a positive way when its levels are high enough to inspire the individual to try harder, exposure to unmanageable stressful levels may affect mental or physical health in a negative way, often followed by potentially pathological problems. Such an experience is reported as performance anxiety or stage fright of stage anxiety, and is a non-positive emotional experience which is observed in situations of individuals’ public exposure (Scott, 2017). It is created by thinking of a negative situation or performance assessment by audience (Goodman & Kaufman, 2014) and is a temporary emotional pattern caused by environmental stress, like worry and tension (Spielberger, 1966), which directly reflects the psychological instant reaction to a particular situation. Not only state anxiety is appropriate for monitoring the psychological state of children, but also it can reveal the potential anxiety of mentally healthy ones, due to its properties.

Performance anxiety refers to an endless anxiety, developing during performances and is not connected with gender, educational level, age, skill, or preparation level (Doğan & Palanci, 2015; Papageorgi et al., 2013). It can affect individuals regardless of their age and skill level (Boucher & Ryan, 2011; Papageorgi et al., 2013; Steptoe et al., 1995). According to Ascenso, Perkins, and Williamon (2018) and Merritt, Richards, and Davis (2001), it is a condition affecting people in a variety of their actions, considering speaking, sports and other acts like dancing and acting. As Thomson and Jaque (2017) mention, persons are frightened of making an disastrous fault or of being non-positive assessed. Should individuals be incapable of coping with possible frightening emotions and somatic reactions, their negative feelings keep growing until they start experiencing a panic attack (Fernholz et al., 2019; Guyon et al., 2020). Obviously, this kind of experience has a negative impact on performance as well. Following a panic attack, the concern that upcoming acts may be sabotaged by performance anxiety is growing. Artists become sensitized to both cognitive and somatic features of anxiety, rather than adjusting to performance situations (Barnard et al., 2011; Thomson & Jaque, 2017). For example, they may develop enough stress to have dyspnea symptoms, which could be linked to a prior panic attack episode, or in a case of witnessing another person trying to get enough oxygen during a panic attack experience. Anxiety may also cause arrhythmic heartbeats or even a heart attack. Increased anxiety feelings also occur due to feelings experienced during memories of past negative events (Kiliç et al., 2008; Thomson & Jaque, 2017). All definitions mentioned above, consider three factors that play a key role in symptoms (State-trait theory of anxiety) (Psychountaki et al., 2003; Spielberger, 1966) that sustain a continuous disturbing experience in performance circumstances (Ascenso et al., 2018; Kenny, 2011; Papageorgi & Kopiez, 2012; Wilson, 2002). Performance anxiety is a condition that persons experience when performing to an audience, regardless of their characteristics (Doğan & Palanci, 2015; Langendörfer et al., 2006). By performance anxiety, a phenomenon close to fright, is established, combining behavioral, cognitive, as well as physiological responses and it stems from a real danger or various unknown causes. Performance anxiety is also classified as social phobia (Ascenso et al., 2018).

Individuals may have to overcome various worries, to support their character during the performance, which constitutes a cause of anxiety (Papageorgi, 2021). They need to adore their performance experiences. During school theatrical events, the intense feelings triggered by performance anxiety have an undesirable impact on the participation of the schoolchildren involved. In the work of Wilson and Roland (2002) it is mentioned that an actor may develop anxiety, which can be translated into apprehension or fear about performing. Placement or reading, memorization, speaking abilities, instrument abilities, and interfering with grip and appearance can also be affected by performance anxiety (Meijer & Oostdam, 2011). In certain cases, this kind of emotional state has a non-positive impact on the pupil’s skills. Furthermore, when it comes to prevalence rates, Studer et al. (2011), mentioned that a percentage of 12% of music students deals with high anxiety on stage, Papageorgi (2022) stated that a percentage of 11% of adolescent musicians also does, and Steptoe et al. (1995) stated that a 10% of professional actors also deals with high anxiety on stage. Moreover, studies show that prevalence rates for childhood social anxiety disorder range from 0.5 to 9.0% in pediatric primary care samples and community studies (Hitchcock et al., 2009).

Consequently, development of practices that will recognize schoolchildren who are at-risk, considering performance anxiety, for a teacher to develop coping practices, is necessary. However, a procedure like this is considered enormously hard due to its multi-criteria complexity. The reflection centers on an approach for an effective detection of those cases, a safe criteria adoption for the multi-criteria evaluation of schoolchildren’s anxiety, along with the time needed to implement such a task. Recognizing and identifying such persons in early is crucial. According to reported findings, the effectiveness of the treatment of performance anxiety mostly relies on its primary identification to be treated with suitable interventions.

Under those circumstances, Computational Intelligence (CI), Artificial Intelligence (AI), and Machine Learning (ML) approaches are able to support the educator in this way, since they introduce effective fast solvers on addressing complicated problems that could not be addressed by conventional methods (Kar, 2016; Yang et al., 2021). Regardless of the particular application, AI-based technologies depend on the detection of patterns within large data sets (Ćosić et al., 2020; Zhou et al., 2020). Thus, AI has a transformational power (Hwang et al., 2020). In this fashion, the high levels of performance anxiety detection at schoolchildren can be considered as an non-deterministic polynomial-time hardness problem (NP-hard problem), since numerous objectives, for the individual assessment, are taken into account (Kusserow et al., 2012; Mastrothanasis et al., 2021; Qaddoura et al., 2021; Zervoudakis et al., 2020). As a result, the main purpose of the current research is to identify high levels of performance anxiety at schoolchildren, through learner segmentation using CI methods, and more specifically metaheuristic-based ones.

The rest of the paper is organized as follows: In Sect. 2, a review of the literature when it comes to anxiety detection using ML methods is demonstrated, while in Sects. 3 and 4, the Mayfly optimization algorithm and the purpose of the study are described, respectively. In Sect. 5 the method to be used is presented and in Sect. 6 the results are demonstrated. Finally, in Sects. 7 and 8, a discussion as well as the main conclusions of the research are presented.

2 Artificial intelligence and anxiety detection

When it comes to anxiety detection, machine learning methods have already been used (Muhammad et al., 2020; Priya et al., 2020). For instance, in the recent work of Khan et al. (2021) the authors detected psychological disorders by successfully implementing deep learning based models of machine learning algorithm, in the recognition of human behaviors pertaining to anxiety. Moreover, Pintelas et al. (2018) who reviewed the literature on applying machine learning methods in the field of detecting anxiety disorders, showed that common machine learning methods like logistic regression, random forest and Support Vector Machine (SVM), can be used to detect anxiety disorders. Particularly, Chatterjee et al. (2014) used Logistic Regression, Naive Bayes and a Bayesian Network to predict generalized anxiety disorder, while Chen et al. (2015) used a Bayesian joint model with a linear mixed effects model for the longitudinal measurements, and a generalized linear model for the binary primary endpoint, Hilbert et al. (2017) used binary SVM within a nested leave-one-out cross-validation framework, and Dabek and Caban (2015) used a neural network model for the same purpose.

When it comes to posttraumatic stress disorder, Saxe et al. (2017) performed a comparative analysis of support vector machine, Lasso, Logistic regression and Linear Regression to predict the anxiety of 163 children. Further details can be found in the work of Khan et al. (2021). Moreover, Galatzer-Levy et al. (2014) attempted to improve the prediction of posttraumatic stress disorders using machine learning predictors using a set of 957 trauma survivors within 10 days of a traumatic event. Furthermore, a hybrid system of standard machine learning techniques to classify posttraumatic stress disorder allowing three popular feature selection methods such as chi-square, principal component analysis and correlation based-feature selection was followed by Omurca and Ekinci (2015). Through the particular approach, they determined important indications of patients’ trauma alternative approach for predicting posttraumatic stress disorder. The accuracy of the particular method varied from 74 to 79%.

Social anxiety disorder, as well as panic disorder and agoraphobia, are also reported to be predicted using machine learning predictors. For instance, Liu et al. (2015) used multivariate pattern analysis, while Zhang et al. (2015) used a linear support vector machine to classify patients according to their levels of social anxiety. Finally, Lueken et al. (2015) applied a machine learning approach to separate depressive comorbidity from panic disorder.

Furthermore, the literature review reveled that binary classifiers were used to diagnose children’s anxiety in the work of McGinnis et al. (2019) and Ding et al. (2022) used a dataset of paired self-reported state anxiety levels, in order to train and validate machine learning models like linear regression, support vector regression, LASSO regression, and ensemble of trees, to predict state anxiety. Finally, Salekin et al. (2018) used a weakly supervised learning framework to detect social anxiety and Sau and Bhakta (2019) used approaches like Logistic Regression, Random Forest, Naive Bayes, and SVM and Catboost, to early detect and treat anxiety disorders. Finally, Xiong, et al. (2021) used a feature ensemble based Bayesian neural network for the prediction of anxiety disorders and Mastrothanasis, et al. (2021) used a bio-inspired metaheuristic-based algorithm to classify students according to their levels of state anxiety.

3 Bio-inspired computation intelligence

Bio-inspired optimization algorithms constitute popular and effective tools of computation intelligence for addressing complex optimization problems (Yang, 2020). Nature-inspired algorithms development has become very popular during the last two decades. Bio-inspired algorithms such as Genetic Algorithms (GA), ant colony algorithms, bat algorithms, bee algorithms, Firefly Algorithms (FA), cuckoo search, flying foxes optimization algorithm and Particle Swarm Optimization (PSO) have been applied in various research fields with a dramatic increase of the number of relevant publications (Askarzadeh, 2016; Attaran et al., 2021; Faramarzi et al., 2020; Yang, 2020; Zervoudakis & Tsafarakis, 2022; Zitouni et al., 2021). Related algorithms can generally be separated into four groups: (a) evolutionary based bio-inspired algorithms, (b) swarm intelligence-based bio-inspired algorithms, (c) ecology-based bio-inspired algorithms and (d) multi-objective bio-inspired algorithms (Fan et al., 2020).

According to Torres-Jiménez and Pavón (2014), this kind of algorithms have the ability to address various optimization problems, with numerous requirements, due to their abilities when it comes on exploring the search space, escaping from local optima, and determining if a solution is considered good enough for the decision maker. Furthermore, they give the decision maker the ability to explore the trade-off between the performance and the quality of solutions. As a result, they are applied in various real-life applications.

According to José-García and Gómez-Flores (2016) and Ezugwu (2020), nature-inspired-based clustering algorithms have been successfully applied in addressing complex real-world engineering problems as well as cluster analysis. For instance, the GA, is a widely used evolutionary algorithm, which has been applied to address several real-world optimization problems such as location problems (Chappidi & Singh, 2022), data clustering problems (Chehouri et al., 2017), and flow shop scheduling problems (Liang et al., 2022). The Differential Evolution (DE) which is another evolutionary algorithm, has been applied to address complex optimization problems in computer science and engineering (Tsafarakis et al., 2020; Xiang et al., 2015), while numerous DE hybrid variants have equally been used to solve various data clustering problems (Ezugwu et al., 2022). Furthermore, the PSO algorithm has been applied to solve problems such as hierarchical and partitional data clustering (Rana et al., 2011) by various researchers due to its advances when it comes to convergence rate and convergence speed. Finally, FA which is based on the flashing behavior of fireflies, is another successful metaheuristic algorithm that has hundreds of successful applications recorded in solving many complex engineering problems (Fister et al., 2013) and data clustering problems (Senthilnath et al., 2011).

3.1 Mayfly optimization algorithm

The Mayfly optimization Algorithm (MA) (Zervoudakis & Tsafarakis, 2020) is a newly optimization algorithm which has already been used by researchers, to address their non-deterministic polynomial-time hardness problems (Alblehai et al., 2022; Bhattacharyya et al., 2020; Guo et al., 2021; Gupta & Gehlawat, 2022; Liu et al., 2021; Roni et al., 2022; Tamilmani et al., 2022; Zhao & Gao, 2020). The main reason behind the widely application of the particular algorithm, is that it combines major advantages of both evolutionary and swarm intelligence optimization algorithms. Particularly, MA combines advances of PSO, GA and FA (Zervoudakis & Tsafarakis, 2020). As a result, MA’s performance is superior to that of other state-of-the-art metaheuristics when it comes on addressing complex objectives.

In this algorithm, each mayfly’s position represents a potential solution to the problem. Firstly, a set of female and a set of male mayflies are generated at random, as \( \varvec{x}=({x}_{1},\dots ,{x}_{d})\), whose performance is assessed on the predefined objective function f(x). Considering \( {x}_{i}^{t}\) as the present position of mayfly i in the objective space during step t, it is altered by summing it with a velocity \( {v}_{i}^{t+1}\), as:

$$ {x}_{i}^{t+1}={x}_{i}^{t}+{v}_{i}^{t+1}$$
(1)

with xi0~U(xmin, xmax). The velocity \( v\) of a male mayfly i is computed as:

$$ {v}_{ij}^{t+1}=\left\{\begin{array}{c}{g*v}_{ij}^{t}+{a}_{1}{e}^{-\beta {r}_{p}^{2}}\left({pbest}_{ij}- {x}_{ij}^{t}\right)+{a}_{2}{e}^{-\beta {r}_{g}^{2}}\left({gbest}_{j}- {x}_{ij}^{t}\right), if f\left(gbest\right)>f\left({x}_{i}\right)\\ {v}_{ij}^{t}+d*r, else\end{array}\right.$$
(2)

while for female mayflies as:

$$ {v}_{ij}^{t+1}=\left\{\begin{array}{c}{g*v}_{ij}^{t}+{a}_{2}{e}^{-\beta {r}_{mf}^{2}}\left({x}_{ij}^{t}- {y}_{ij}^{t}\right), if f\left({y}_{i}\right)>f\left({x}_{i}\right)\\ {v}_{ij}^{t}+fl*r, else\end{array}\right.$$
(3)

where \( {v}_{ij}^{t}\) is the velocity of mayfly i in dimension j = 1,…,n at time step t, \( {a}_{1}\) and \( {a}_{2}\) are positive attraction constants. \( {pbest}_{ij}\) is the best position mayfly i had ever gone and \( gbest\) is the overall best solution found so far. Moreover, \( \beta \) is a fixed visibility coefficient, \( {r}_{p}\) corresponds to the Cartesian distance between two individuals, computed as:

$$ ||{x}_{i}-{X}_{i}||=\sqrt{\sum _{j=1}^{n}{\left({x}_{ij}-{X}_{ij}\right)}^{2}}$$
(4)

Finally, g is a constant in the range of \( \left(0\right.,\left.1\right]\), \( d\) is a positive nuptial dance coefficient and \( r\), is a random value in the range [-1, 1] and fl. a random number.

The mating process between two mayflies is represented by the crossover (genetic) operator as:

$$ \begin{array}{c}offspring1=L*male+\left(1-L\right)*female\\ offspring2=L*female+\left(1-L\right)*male\end{array}$$
(5)

where \( male\) is the male parent and \( female\) is the female one. \( L\) is a random value in the range (0,1). Initial velocities of offspring are set to zero. A detailed description about the way mayflies move, can be retrieved from the work of Zervoudakis and Tsafarakis (2020).

Algorithm 1

Pseudo code of Mayfly (MA).

Objective function of the problem f(x), x=\( {\left({x}_{1},\dots ,{x}_{d}\right)}^{T}\)

Initialize the male mayfly population \( {x}_{i}\left(i=\text{1,2},\dots ,N\right)\)

Initialize the male mayfly velocities \( {v}_{mi}\)

Initialize the female mayfly population \( {y}_{i}\left(i=\text{1,2},\dots ,M\right)\)

Initialize the female mayfly velocities \( {v}_{fi}\)

Evaluate solutions according to the predefined objective function.

Find global best solution (\( gbest\)).

Find personal best solutions of male mayflies (pbest).

Do While stopping criteria are not met.

Update velocities and positions of male mayflies.

Update velocities and positions of female mayflies.

Evaluate solutions according to the predefined objective function.

Rank the mayflies according to the objective function values.

Mate the mayflies.

Evaluate offspring according to the predefined objective function.

Separate offspring to male and female randomly.

Replace worst solutions with the best new ones.

Update \( pbest\) and\( gbest\)

end while.

Postprocess results and visualization.

4 Purpose of the study

The main objective of the current research is to identify high levels of performance anxiety at schoolchildren, through learner segmentation using CI methods. Through a MA-based clustering algorithm, dissimilar learner clusters, with the individuals of each one being distinguished by homogeneous characteristics of performance anxiety are created. The clustering approach will be applied on an observation data set regarding the positive, the mild negative and the strong negative feelings of students, during their artistic participation in school theatrical events (Mastrothanasis & Kladaki, 2022), based on the instrument of evaluating the state performance anxiety of Spielberger et al. (1973). Consequently, through these clusters, the cases of schoolchildren experiencing non-positive emotional experiences will be demonstrated. Finally, the performance of the proposed method will be compared with that of two state-of-the-art metaheuristics.

5 Method

5.1 Datasets

To complete this research, evaluative quantitative data of 774 10–12-year-old primary school pupils (average age = 10.53, S.D.= 1.72), of which 466 were boys (60.2%) and 308 were girls (39.8%) from numerous different parts of Greece, were used. These students among with their teachers participated in a research program, which was conducted throughout Greece by the University of the Aegean, during the academic year 2020–2021 (Mastrothanasis & Kladaki, 2022).

Once the teachers were properly trained, they prepared a school play, with their students as the actors, as part of their schools’ cultural activities. During their students’ performance, the teachers observed and captured the feelings of the students, according to three three-point Likert observation scales (Mastrothanasis & Kladaki, 2022). These observation scales were based on the theoretical background of state anxiety for children of Spielberger et al. (1973) (State Anxiety Inventory, STAIC), and each item is scored with 1, 2 or 3 points by the researchers, with higher values indicating more negatively charged emotional states.

The first scale was about positive feelings (Pos) and included five items about emotional states that positively affect a person, such as calmness, pleasantness, happiness and satisfaction. The second scale was about mild negative feelings (Mildneg) and included four items regarding emotional states that negatively affect a person but do not have a great emotional intensity, such as disturbance, nervousness, discomfort, and worry. Finally, the third scale was about the existence of strong negative feelings (Strongneg) and included three items related to adverse emotional states characterized by intense emotional loading, such as panic, terror and fear (Mastrothanasis & Kladaki, 2022). According to the theory of Spielberger et al. (1973) these kinds of emotions shape the actors’ state performance anxiety (Psychountaki et al., 2003).

The three scales which were described above were included in a single online questioner, on which the teachers noted the intensity of the emotions they observed in their students. After completing the questionnaire, the teachers sent their answers to the researchers. Figure 1 indicatively demonstrates a small part of the questionnaire, and the way in which it was completed by the teachers.

Fig. 1
figure 1

Indicative questions

The descriptive data and the violin plots of the data retrieved from the three adopted measurement scales, are demonstrated in Table 1; Fig. 2, respectively.

Table 1 Descriptive statistics of datasets
Fig. 2
figure 2

Violin plots for emotion scales

5.2 Research design

To address the NP-hard problem of clustering of learners when it comes to performance anxiety, the MA was used.

In this research, it is assumed that \( n\) schoolchildren distinguished by \( m\) characteristics, are to be separated into a maximum of \( k\) clusters. In this fashion, a potential solution is a matrix of \( k \times (m+1)\) elements, as Zervoudakis, Mastrothanasis and Tsafarakis (2020) suggest. As a result, each cluster is represented by each row of a potential solution. The randomly generated initial positions of the \( m\) columns of each solution are generated between the range of each characteristic values, while values in the last column range on \( \left(\text{0,1}\right)\) to determine the amount of groups. Rows of a solution with a value less than 0.5 in the last column are excluded and the amount of groups is reduced. The remaining ones with the last column ignored, are then evaluated. If the number of remaining clusters is less than two, an iterative process begins, where the cluster with the smallest value in the last column is selected, until a total of two clusters is achieved. An example of a random potential solution when searching for a maximum of five clusters, using the dataset presented in Subsection 5.1, is demonstrated in Fig. 3.

Fig. 3
figure 3

Random potential solution

Since the values in the last column of the second, third and fifth clusters are less than 0.5, these clusters are excluded together with the entire fourth column, and the final number of clusters is reduced to two, whose centroids are represented in Fig. 4, and will be evaluated by the objective function.

Fig. 4
figure 4

Random potential solution before assessment

Each mayfly is then assessed using the Davies-Bouldin index, which is a commonly used metric for evaluating potential solutions when it comes to cluster analysis (Davies & Bouldin, 1979). The goal is to minimize the DBI value. Using this index, a cluster is assessed according to the \( {R}_{i,j}\) value, which is computed as:

$$ {R}_{i,j}\equiv \frac{{S}_{i}+{S}_{j}}{{M}_{i,j}}$$
(6)

where \( {S}_{i}\) and \( {S}_{j}\) are the clusters \( i\) and \( j\) dispersions, respectively, computed as:

$$ {S}_{i}={\left(\frac{1}{{T}_{i}}\sum _{j=1}^{{T}_{i}}{\left|{X}_{j}-{A}_{i}\right|}^{q}\right)}^{1/q}$$
(7)

where \( {T}_{i}\) is the amount of individuals in cluster \( i\), \( {X}_{j}\) is a vector of their characteristics, \( {A}_{i}\) correspond to the cluster \( i\) centroid, and q is an integer. In the current research q has the value of 1, which makes \( {S}_{i}\) as the average Euclidean distance of individual’s characteristics in cluster i to its centroid. As a result, each individual is placed to the cluster with the minimum distance.

Clusters \( i\) and \( j\) are characterized by \( {M}_{i,j}\) which is the Minkowski metric, as (Friedman & Rubin, 1967):

$$ {M}_{i,j}={\left\{\sum _{k=1}^{N}{\left|{a}_{k,i}-{a}_{k,j}\right|}^{p}\right\}}^{1/p}$$
(8)

where \( {a}_{k,i}\) is the \( {k}^{th}\) element of \( {A}_{i}\). In the current research, \( p\) = 2, which corresponds to the Euclidean distance.

To compute the objective function value, \( \stackrel{-}{R}\) is calculated as follows:

$$ \stackrel{-}{R}\equiv \frac{1}{N}\sum _{i=1}^{N}{R}_{i}$$
(9)

where N correspond to the number of groups to be evaluated, and.

$$ {R}_{i}\equiv {R}_{i,j} i\ne j $$
(10)

All the optimization approaches were coded using the Matlab platform, on an i5 3.3 GHz desktop computer with an 8GB of RAM.

The final clustering solution is then assessed through a Multivariate Analysis of Covariance (MANCOVA) test. The size effects were also computed using the Eta Squared (η2) indicators. Finally, a Receiver Operator Characteristic (ROC) curve was generated and the area under the ROC curve (AUC) was computed to estimate the screening and the diagnostic accuracy of the method (Ma et al., 2013). The ROC analysis results are interpreted as: AUC ≥ 0.90, high diagnostic accuracy; AUC in the range of 0.70–0.90, moderate diagnostic accuracy; and AUC < 0.70, low diagnostic accuracy (Streiner & Cairney, 2007).

6 Results

MA runs for 50 times. Its performance is compared to that of the state-of-the art metaheuristics like Particle Swarm Optimization (PSO) and Genetic Algorithm (GA).

Initially, the same fine-tuning process as described by Zervoudakis and Tsafarakis (2020) was performed, to determine the best parameter configuration settings. Following the work of Zervoudakis and Tsafarakis (2020), each initialization scheme was performed for 10 times. Initially, the population size of both male and female mayflies was tested. The results of the algorithm according to the swarm sizes, are presented in Fig. 5. The blue line corresponds to the male mayflies and the orange line to the female ones. The findings verify the use of a fixed population size of n = 40 (20 males and 20 females).

Fig. 5
figure 5

Results according to the swarm sizes

The attraction constants were then tested, as presented in Fig. 6. The findings verify the use of attraction constants of a1 = 1, a2 = 1.5.

Fig. 6
figure 6

Results according to attraction constants

The nuptial dance and random flight coefficients were then tested, as presented in Fig. 7 The blue line corresponds to the nuptial dance while the orange line corresponds to random flight. The findings verify the use of nuptial dance d = 0.1, random flight fl = 0.1.

Fig. 7
figure 7

Results according to nuptial dance and random flight

Finally, the visibility coefficient β = 2 was also verified according to Fig. 8.

Fig. 8
figure 8

Results according to visibility coefficient

In the same way, it was found that, a single point uniform crossover with a rate of 0.95 and a linear selection mechanism as well as a gaussian mutation with a rate of 0.1, were the most sufficient for using in this research. The findings were in line with the work of Zervoudakis and Tsafarakis (2020).

By repeating the process for PSO, a fixed population size of n = 50 was used, acceleration coefficients c1 and c2 were set to 1.5 and 2, respectively, while the inertia weight value w was set to be changed during the optimizing process from 0.9 to 0.5. The findings were in line with the finding of Zervoudakis, Mastrothanasis and Tsafarakis (2020) who used a PSO algorithm for student segmentation. Finally, for GA, a fixed population size of n = 50 was used, and it was found that values of 0.8 and 0.05 were the most sufficient ones for crossover and mutation rate, respectively. The crossover and mutation techniques used in this research are the same with the ones used in previous research for GA (Laref et al., 2019; Zervoudakis et al., 2020).

Table 2 presents the results according to DBI. Bold values correspond to the best (minimum) located values. As Table 2 demonstrates, MA’s performance is superior to both PSO and GA in terms of accuracy and efficiency.

Table 2 Performance of comparing algorithms

The convergence behavior of each algorithm is presented in Fig. 9, where the superiority of MA over PSO and GA in terms of convergence speed is demonstrated.

Fig. 9
figure 9

Convergence characteristic curves of optimizers

From the results presented above, it is clear the MA constitutes a better optimization approach in terms of convergence speed and rate, comparing to the other two successful optimization methods. Using MA, two clusters with homogeneous characteristics were created, as demonstrated in Fig. 10. 46 (5.94%) individuals and 728 (94.06%) ones are contained in the first cluster (Cluster A) and the second one (Cluster B), respectively.

Fig. 10
figure 10

Graphical representation of clusters

Cluster A includes schoolchildren characterized by the highest scores on all three scales, while Cluster B includes the ones having the medium and lowest scores, as presented in Table 3, revealing that students in Cluster A are at higher risk when it comes to performance anxiety. As a result, the students of the first cluster are distinguished by fewer positive feelings (2.45 ± 0.44) and more mild negative feelings (2.38 ± 0.36) and strong negative feelings (2.04 ± 0.37), compared to the students of the second cluster. It is noted here that the higher the values of each variable, the more negatively charged the emotional states are.

Table 3 Descriptive statistics by variable and cluster

The multivariate tests on Pos, Mildneg and Strongneg among the students of both groups, revealed a statistically significant difference, with Cluster A being distinguished by statistically greater values of performance anxiety, as presented in Table 4, which reveals the discriminant validity of the proposed method.

Table 4 Multivariate Tests

Moreover, the results of the MANCOVA test for the dependent variables, is presented in Table 5.

Table 5 Multivariate Analysis of Covariance summary table

The ROC curve for the screening accuracy and the diagnostic of Pos, Mildneg and Strongneg emotions for the prediction of performance anxiety, is presented in Fig. 11.

Fig. 11
figure 11

ROC curves for the positive, mild negative and strong negative emotion dimensions

All parameters had good diagnostic ability, with Strongneg emotions being the most promising one. The area under the curve (AUC) of positive emotions was 90% with the respective 95% confidence intervals (CI) between 85.8 and 94.1%. The AUC of mild negative emotions was 95.3% (95%CI: 93.5–97%) and that of strong negative emotions was estimated at 96.5% (95%CI: 93.6–99.4%). The mean area under the curves of the model is greater than 0.9, which suggests high diagnostic accuracy of the performance anxiety.

7 Discussion

In this research, a CI-based student segmentation technique was used to identify extreme levels of performance anxiety at schoolchildren. The MA-based bio-inspired clustering approach is characterized by higher predictive accuracy, compared to common metaheuristics like PSO and GA, according to Table 2; Fig. 9, due to its combination of swarm intelligence and evolutionary operators (Zervoudakis & Tsafarakis, 2020). Another benefit of the proposed approach is that in cases where common metaheuristics get trapped into local optimum points, MA is capable of escaping due to its nuptial dance and random flight operators (Zervoudakis & Tsafarakis, 2020). Based on this approach, it was detected that 5.94% of school-age students experience significantly negative performance anxiety in school performance compared to other students. The screening and the diagnostic accuracy of the mayfly-based clustering method was confirmed by a ROC analysis. This finding is in line with previous research on music students and professional actors while relevant research when it comes to the specific age group was not detected (Mastrothanasis et al., 2021). Particularly, studies show that prevalence rates for childhood social anxiety disorder range from 0.5 to 9.0% (Hitchcock et al., 2009). Moreover, Studer et al. (2011), mentioned that a percentage of 12% of music students deals with stress on stage and Papageorgi (2020) stated that a percentage of 11% of adolescent musicians also does. Finally, based on Steptoe et al. (1995) significantly negative performance anxiety percentage ranges at 10% in populations of apprentice college actors.

Despite the promising results of the proposed method, some limitations need to be further explored. Initially, one dataset was used, focusing on Greek population, and addressing an anxiety disorder developing during theatrical performances. Researchers are highly advised to use the proposed method on other populations and disorders. However, such a task is extremely difficult due to the complexity of data collection in the field of mental health. Collecting and preparing a set of mental health data is an important task, since most of the times individual diagnostic interviews must be conducted by a clinical psychologist. In the case of this research, the used dataset took a long time to collect, as the teachers who observed and recorded the reactions of the young actors had first to be educated and trained on the valid answer to the three-scale questionnaire.

Second, although the generated results using the method could be used to prioritize diagnostic interviews, the clinical psychologist should always have the last word when it comes to diagnosing disorders.

8 Conclusions

The aim of this research was to identify extreme levels of performance anxiety at schoolchildren using CI-based student segmentation techniques. According to the findings the application of MA, reliable solutions can be detected to the NP-hard problem of creating schoolchild clusters with common performance anxiety characteristics. Consequently, possible cases of individuals experiencing significantly non-positive performance anxiety in school theatrical events are identified. A comparison with a GA as well as PSO showed that the proposed MA-based clustering approach is distinguished by superior categorization capabilities. Moreover, MA was reported to converge faster towards optima solutions, compared to PSO and GA.

Teachers could therefore have an application in which they could enter the observation data from their students to automatically detect dangerous cases in the school environment. As a result, the drama educator can effortlessly detect and manage learners, by knowing each group’s characteristics, to develop coping practices in students who are at risk. What is equally important is that using such an application, students facing difficulties except from anxiety, like learning difficulties or depression, could be detected, as long as the data to be entered are in a quantitative form. As a result, through such an application, more educators and psychologists could be provided in schools with high percentages of students who are characterized by such difficulties.

As a group formation method, it can be applied both in actual classroom conditions with schoolchildren, as well as in digital societies, like digital drama, assisting the educator since it is able to offset problems in the multi-criteria clustering of pupils (Pizzo et al., 2019). The biggest benefit of the particular approach is that it can be used to a variety of pupils and suggest group formations in a minimal time of 8.6 s, by means. Moreover, there is no limit to the amount of data that can be imported, nor to the evaluable factors that are collected from the student’s holistic assessment as regards their performance anxiety. As a final point, testing more clustering approaches to the particular problem as well as comparing them in terms of convergence rate, convergence speed and efficiency is highly recommended, as well as research on the educators’ attitudes or other specialists (e.g. theatrical therapist) as regards as their satisfaction with the grouping results and the extent to which it is able to detect anxiety disorders, effectively.