1 Introduction

Understanding and recognizing human emotions has always been a challenging task. Emotions play a crucial role in the daily lives of people, influencing our decision-making, behavior, and overall well-being. Traditionally, emotions have been studied through self-reporting methods, such as questionnaires and interviews, and non-physiological and physiological manuscripts. However, self-reporting methods have limitations, as they rely on subjective interpretations and can be influenced by various factors [1].

On the other hand, non-physiological manuscripts refer to text-based inputs, such as written text, social media posts, or transcripts of conversations. These manuscripts provide valuable information about the thoughts of individuals, feelings, and expressions. They have rich contextual information, including semantic meaning, sentiment, and linguistic nuances. Additionally, non-physiological manuscripts can be collected online, enabling large-scale data analysis. Nonetheless, non-physiological manuscripts have several strengths in emotion prediction. Limitations arise due to the subjective nature of language and the potential for misinterpretation or ambiguity in written expression. Contextual understanding and the ability to account for sarcasm, irony, or cultural variations present ongoing challenges [2].

In contrast to non-physiological manuscripts, physiological signals directly measure the physiological responses of the body associated with emotional experiences. Electroencephalography (EEG), heart rate variability (HRV), facial expressions, and electrodermal activity (EDA) are examples of physiological signals that capture changes in brain activity, heart rate, facial muscle movements, and skin conductance, respectively. Physiological signals offer an objective and direct measure of the emotional responses of the body. They provide insights into the autonomic and central nervous system changes associated with different emotional states. Physiological signals are less prone to subjective interpretation and can provide more accurate and immediate emotion prediction [3].

Compared to physiological signals, non-physiological emotions may not fully or accurately reflect the emotional state of a person because they may be under the control of the individual. [4]. Therefore, EEG signals are crucial in emotion prediction due to their direct measurement of brain activity, high temporal resolution, non-invasiveness, identification of neural correlates, applications in emotion recognition, and potential for personalized interventions. They enable the study of neural mechanisms underlying emotions, enhance prediction capabilities, and offer opportunities for tailored treatments [4,5,6,7,8]. However, the emotions of individuals are not only complicated but also sensitive to unexpected changes depending on the situation. As a result, many researchers describe these complex emotions with two different approaches [9]. Among these, known as discrete basic emotion identification, emotions are divided into six main categories: joy, sadness, surprise, fear, anger, and disgust. Also, in the other approach, known as dimensional emotion recognition, emotions are characterized by three or two dimensions: valence, arousal, and dominance or valence and arousal, respectively [1]. Moreover, valence, arousal and dominance are three dimensions that are commonly used in the field of psychology to describe and measure emotions. Valence refers to the positive or negative quality of emotion, with positive emotions having a positive valence (e.g., happiness, joy) and negative emotions having a negative valence (e.g., sadness, anger). Arousal, on the other hand, refers to the level of physiological activation or stimulation associated with an emotion, with high arousal emotions being associated with feelings of excitement, fear, or anxiety, and low arousal emotions being associated with feelings of calmness or relaxation. Finally, dominance refers to the degree of control or power a person feels in response to an emotion, with high dominance emotions being associated with feelings of being in control, and low dominance emotions being associated with feelings of helplessness or powerlessness. These dimensions are often used together to provide a more complete understanding of an individual’s emotional state. For example, an emotion that is high in valence, high in arousal, and high in dominance might be described as a powerful, positive emotion that is associated with feelings of excitement and control. Conversely, an emotion that is low in valence, low in arousal, and low in dominance might be described as a low-energy, negative emotion that is associated with feelings of sadness and powerlessness. Overall, the valence, arousal, and dominance dimensions are useful tools for psychologists and other researchers to assess and understand emotional experiences. Measuring these dimensions can suggest insight into the complex and multifaceted nature of human emotions, and help to develop interventions and strategies to help individuals regulate their emotional responses [1,2,3,4,5,6,7,8,9,10].

Therefore, emotion recognition plays an important role in human–machine interaction so that the machine can accurately detect an individual’s emotional changes and provide corresponding feedback. As a result, the use of various artificial neural networks (ANNs) or neuro-fuzzy inference systems has become essential for the goal of emotion recognition, which has been the focus of several studies [6, 7].

ANNs are systems that are inspired by the learning ability of the human brain and are also known to have a universal approach. Neuro-fuzzy, on the other hand, are systems that process data using the learning algorithm of ANN theory with fuzzy logic that transforms an input set into output using IF–THEN fuzzy rules. While the fuzzy system component defines the membership functions (MF), ANN is used to extract the fuzzy rules from the numerical data and to set the parameters of the membership function consistently with the learning process. Hereby, neuro-fuzzy systems are known as universal estimators as they can handle nonlinear data well, have the ability to deal with uncertainty, and are also resistant to noise [11, 12]. Notwithstanding, adaptive neuro-fuzzy inference system (ANFIS), which is a special type of neuro-fuzzy system, first extracts the membership function parameters from a dataset that defines the behavior of the system and then adjusts the system parameters according to the determined error criterion by learning the features in the dataset. By this structure, ANFIS reveals important results in modeling nonlinear functions. Therefore, fields of decision-making, pattern recognition, information technology, data analysis, etc., there has been an increased interest to ANFIS [11].

The radial basis function neural network, one of the most extensively used ANN models, is quite well for classifying multi-class and high-dimensional data given the greater possibility of attaining a global minimum and its slower convergence rate compared to alternative ANN models. Barely, the number of nodes in RBFNN with a single hidden layer is equal to the number of training patterns. Hence, increasing the training patterns increases the RBFNN size. The training pattern size can be decreased using statistical methods to reduce the size of the RBFNN [12, 13].

Another model of ANN is the probabilistic neural network (PNN), classification, prediction, and recognition applications are derived by probability density function, which is a widely used neural network. PNN, a model based on competitive learning based on winner-take-all logic, provides a general solution for pattern classification problems using the Bayesian classifier approach. PNN also uses the developed Parzen Estimators to generate the probability density functions required in Bayesian theory. The PNN network structure consists of four layers, namely the input layer, pattern layer, summation layer, and output layer. Therefore, PNN is commonly used due to its layer structure and fast training features [14, 15].

2 Related works

Studies on EEG emotion recognition has been fascinating for an extended period, with numerous articles proposing various methodologies for the development of systems capable of recognizing human emotions. These studies typically classified emotions into two, three, or four distinct classes [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]. When we considered with two classes in the literature, Moreira et al. [16] proposed a method for recognizing emotions in music based on the ANFIS system. The authors collected a dataset of 80 music excerpts, each one associated with one of four emotions: happy, sad, angry, and peaceful. They then used the ANFIS system to extract features from the music excerpts and classify them according to the emotions they conveyed. The results of the study showed that the ANFIS system achieved an accuracy rate of 65.83% for the three emotions and 88.75% for the two emotions. Similarly, Degirmenci et al. [17] explores the use of Empirical Mode Decomposition (EMD) and its multivariate extension (MEMD) for emotion recognition using EEG signals. They proposed a MEMD-based feature extraction method to process multichannel EEG signals for emotion recognition. The proposed method is applied to the DEAP emotional EEG dataset using ANN method, and the results show that the accuracy of the system is 72.87% for valence and 75% for arousal. Although studies involving two emotion classifications are promising, their efficacy in describing emotions remains insufficient. An increased number of emotion classes typically allows for a finer understanding of the emotional spectrum. Therefore, more than two emotion classifications needed to be examined.

We can also observe that the studies using three emotions take place in the literature [18,19,20,21,22], for instance, Krisnandhika et al. [18] proposed a new emotion recognition algorithm model called the multi-feature deep forest (MFDF). The results show that the MFDF achieved an accuracy of 69.75 and 75.7% for valence and arousal. Zhong et al. [19] proposed a regularized graph neural network (RGNN) for EEG-based emotion recognition. To better handle cross-subject EEG variations and noisy labels, the authors propose two regularizers. The proposed RGNN model is extensively evaluated on three emotion classes, positive, negative, and neutral. The results show that the accuracy of the proposed model is 94.24%. Meanwhile, Qing et al. [20] proposed a new approach to EEG signal-based emotion recognition that focuses on both classification accuracy and interpretability of emotion progression. The authors propose a machine learning-based method for extracting features from EEG signals and classifying emotions. The proposed method was validated through studies on the DEAP and SEED datasets. They classified three classes and the accuracies of the proposed model are 63.09 and 75% for DEAP and SEED datasets, respectively. A novel approach for emotion recognition by fusing both information propagation patterns and activation differences in the brain is proposed by Li et al. [21]. Containing positive, neutral, and negative emotions the study constructed emotion-related brain networks using phase locking value and adopted a multiple-feature fusion approach to combine the compensative activation and connection information for emotion recognition. The proposed approach is evaluated on three public emotional databases, and the accuracy of the system was reported to be 88.06% for the three classes. In another study where three emotions were classified, by Luo et al. [22], a new approach utilizing generative adversarial networks (GANs) has been proposed. Specifically, a Conditional Wasserstein GAN (CWGAN) framework has been developed to augment EEG data and improve the accuracy of EEG-based emotion recognition. The performance of the proposed CWGAN framework has been evaluated on the SEED and DEAP datasets for emotion recognition. The results show that the use of EEG data generated by CWGAN improves the accuracy of emotion recognition models. Specifically, the accuracy of the system is 86.96% for the three classes (SEED dataset) and 78.17 and 73.79% for the valence and arousal classes (DEAP dataset), respectively. The results presented above show that the accuracy of emotion classification with three emotions is inadequate and can be enhanced.

On the other hand, studies that classify the four emotions are existing in the literature. Such as, Liu and Sourina [23] presented a novel approach to recognizing the valence level of emotions from electroencephalogram (EEG) signals in real-time using fractal analysis. The authors proposed a fractal-based method to extract features from EEG signals that are related to the valence level. The extracted features are then used to train a support vector machine (SVM) classifier to classify the valence level in real-time. The results showed that the proposed method achieved an accuracy of 63.04% for arousal dominance recognition with four classes. At the same time, Zheng et al. [24] investigates the stability of EEG patterns for emotion recognition using k nearest neighbors (KNN), logistic regression (LR), and support vector machine (SVM) machine-learning approaches. The authors evaluate the performance of various feature extraction, selection, smoothing, and classification methods using DEAP datasets. They compare their method to several other state-of-the-art methods for emotion recognition from EEG signals. According to the results reported in the paper, the proposed method achieved an accuracy of 69.67% for four classes. Mehmood et al. [25] presents an analysis of two classification methods, SVM and KNN, for emotion recognition using EEG signals. The study involved presenting four different emotional stimuli to five male subjects, and after preprocessing the raw EEG data, Hjorth parameters were used for feature extraction. The highest accuracy was obtained with KNN, which was 61%. In another study where four emotions were classified, Islam and Ahmad [26] proposed a system for emotion recognition from EEG signals based on Discrete Wavelet Transform (DWT). They calculated two significant features, Wavelet Energy and Wavelet Entropy, to detect four different emotions, namely happy, angry, sad, and relaxed. To evaluate the performance of the proposed method, they used the internationally authorized DEAP database. They compared the performance of the method with other state-of-the-art methods, and their proposed method achieved 62.3 ± 1.1% accuracy. Similarly, Ravindran et al. [27] investigated the patterns of EEG signals in the human brain and proposed a system that includes a modified adaptive filtering technique for signal preprocessing, which is used to remove noise and artifacts in the EEG signal. Additionally, an ANFIS is proposed for classifying and analyzing emotions based on the selected features. The proposed system is compared with existing emotion recognition systems, and its performance is evaluated using metrics such as accuracy, specificity and sensitivity. The paper reports an average classification accuracy of 86.6%. Nevertheless, the study conducted by Fang et al. [28] involved an analysis of two labels of arousal and valence in the DEAP dataset. The authors compared the performance of their Multi-Feature Deep Forest (MFDF) model against several machine learning classifiers, such as KNN, random forest (RF), and SVM. The research team utilized Power Spectral Density (PSD) and Differential Entropy (DE) as feature extraction techniques to obtain an average accuracy rate of 71.05%. The proposed MFDF model exhibited a higher accuracy rate than RF, KNN, and SVM classifiers by 3.40%, 8.54%, and 19.53%, respectively. Nakisha et al. [29] also proposed a new framework to automatically search for the optimal subset of EEG features using evolutionary computation (EC) algorithms. The framework was evaluated using two public datasets (MAHNOB, DEAP) and a new dataset acquired with a mobile EEG sensor. The proposed framework involves four steps: feature extraction, feature selection, channel selection, and classification. The selected features and channels are used to classify the four classes of emotions using the probabilistic neural network (PNN). The results of the study obtained a 67.4% accuracy rate for the DEAP dataset. Finally, Lan et al. [30] proposed a novel real-time subject-dependent algorithm with the most stable features, providing better accuracy than other available algorithms when it is crucial to have only one training session for the user and no re-training is allowed subsequently. The algorithm was tested on an affective EEG database containing five subjects, where four emotions (pleasant, happy, frightened, and angry) were induced. The testing results demonstrated that the proposed algorithm had up to 49.63% accuracy for four emotions classifications, and 73.10% for positive and negative emotions classification.

In conclusion, when the above-mentioned findings are examined, it is seen that the accuracy of recognizing three or four emotions in emotion classification is insufficient and needs to be improved. However, the proposed approaches utilize shallow learning techniques such as PNN, KNN, RF, SVM, and ANFIS classifiers, as well as filtering techniques and feature extraction methods. Therefore, to improve system performance for three and four emotions recognition from EEG signals, the main goal of this research is to propose a hybrid structure that combines various adaptive networks with the fuzzy logic method, ANFIS. ANFIS can capture complex, nonlinear relationships in the data; while, shallow techniques can provide robust classification or regression capabilities. Consequently, the proposed model retains interpretability, from the capacity of ANFIS to provide rules based on fuzzy logic, while also benefiting from the predictive strength of shallow models.

On the other hand, two data sets were used in this study to compare the accuracy of the proposed model. The first of these is The Feeling Emotions dataset, and the second is the DEAP dataset. The Feeling Emotions dataset used for three emotion recognition (positive, negative and neutral) is classified by training each parallel ANFIS model for each class using the first three and five features of the EEG signals. Afterward, the recognition results obtained in the first layer are aimed to be used as input to an adaptive network structure for emotion recognition. The high accuracy of emotion recognition with the selected first features and proposed hybrid model is supported by performance results.

The other motivation of this study is to classify four emotions, namely, the happy, sad, fear, and relaxed emotions from the EEG signals using positive valence and high arousal, negative valence and negative arousal, negative valence and positive arousal, and positive valence and negative arousal, respectively. The DEAP dataset is used in the proposed model. The dataset is firstly classified into four classes using valence and arousal dimensions. The features are extracted using the power spectral density (PSD) method after the dataset is pre-processed methods, which are the bandpass filter and normalization. The accuracy of the proposed model is also compared with other studies, which use shallow learning techniques and four classes.

The paper is organized as follows. In Chapter 3, the datasets used and proposed network structure are explained in detail. The results obtained and their comments are discussed in Chapter 4. Finally, the consequences are presented in Chapter 5.

3 Material and methods

This paper proposes a model for recognizing emotions based on the following stages: setting the dataset, extracting features, and classifying the data.

3.1 Dataset settings

The used dataset in the study has created by the collected EEG signals from the TP9, AF7, AF8, and TP10 electrodes of two people (one male, one female) [31, 32]. In this dataset, which includes 2548 features and 2132 samples, six movie clips are used as stimuli to evoke positive and negative emotions, and no stimulus is used to elicit a neutral emotion. However, EEG signals are collected for a total of 36 min. The movie clips used are shown in Table 1.

Table 1 Movie clips and emotion class

The DEAP [33] dataset includes recordings of physical signals like EEG and peripheral signals and subjective evaluations of 32 participants (50% female and 50% male) who watched 40 one-minute-long music video clips selected from both positive and negative emotional categories. The videos were rated in terms of arousal, valence, dominance, liking, and familiarity. The physiological signals also include ECG, EMG, galvanic skin response (GSR), respiration, and temperature.

In this paper, the EEG data were processed by removing ECG, EMG, GSR, respiration, and temperature and using 32 channels. The raw data are first pre-processed, which includes normalization and frequency band division. The EEG signals were sampled at 512 Hz. and bandpass filter has been implemented with a cutoff frequency of 40 Hz. After the filtering operation, feature extraction was performed for highlighting the main data. The power spectral density method (PSD) was utilized in this study to extract frequency domain features.

The ratings of valence, arousal, and dominance, on the other hand, obtained from the DEAP database are expressed as continuous numerical values ranging from 1 to 9, representing subjective experiences. Therefore, the happy emotion is evaluated on the two-level of positive valence and high arousal; while, sad, fear, and relaxed emotions are assessed as negative valence and low arousal, negative valence and high arousal, and positive valence and low arousal, respectively. The dominance dimension is not used in this study cause positive can be either high or low in dominance, depending on the situation.

In this study, the following criteria were utilized to assign labels of happy, sad, fear and relaxed to the relevant data from DEAP dataset.

Fear: Valence \(<\) 4.5 and Arousal \(>\) 5.5

Happy: Valence \(>\) 5.5 and Arousal \(>\) 5.5

Relaxed: Valence \(>\) 5.5 and Arousal \(<\) 4.5

Sad: Valence \(<\) 4.5 and Arousal \(<\) 4.5

Only the DEAP dataset contains down-sampled and segmented raw data in this study. The techniques performed for the feature vectors are only valid for the DEAP dataset.

3.2 Feature extraction

In the feature extraction stage, the raw signal is transformed into a feature vector that captures key information and distinctive characteristics of the signal [28]. In this study, a frequency domain method called power spectral density was employed for feature extraction. This method is a measure of the power of the signal as a function of frequency and is often used to analyze the frequency content of signals in various fields, such as neuroscience. The mathematical structure for the power spectral density technique is depicted in Eq. (1) as follows:

$$S_{i} \left( f \right) = \frac{1}{M}\left| {\mathop \sum \limits_{k = 1}^{M - 1} y_{i} \left[ k \right]e^{ - j2\pi fk} } \right|^{2}$$
(1)

where \({S}_{i}(f)\) represents the power spectrum of the signal \({y}_{i}[k]\) and M indicates the length of the signal.

The processed DEAP dataset contains segmented raw data from 32 subjects, with each subject having their own file. Each file includes two arrays: a 40 × 40 × 8064 data array (video/trial × channel × data) and a 40 × 4 labels array (video/trial × label) for valence, arousal, dominance, and liking. In this study, feature extraction was carried out for the valence and arousal dimensions using 32 channels over 40 × 40 × 8064 data.

3.3 Proposed model

ANFIS has extensive utilization across various domains, including intelligent control, pattern matching, classification, and more. Within these practical implementations, it characterizes the uncertain, ambiguous, subjective, linguistic, and nonlinear connection between the input and output conditions of a system, typically through a collection of rules. Nonetheless, obtaining the parameters for membership functions and fuzzy rules can be a challenging task, particularly when dealing with complex problems. However, the ANFIS structure provides a convenient approach to generating membership functions and fuzzy rules specifically designed for Sugeno-type fuzzy inference systems [34, 35].

The process of classifying emotion recognition may be perceived as complex due to the necessity of dealing with numerous fuzzy rules and membership function parameters. While ANFIS integrates with both fuzzy inference systems and artificial neural networks, it is effective in solving nonlinear and complex problems within a framework. Therefore, the performance of the ANFIS method is explored in order to achieve the identification of emotions. Typically, ANFIS structures yield a single output, however, in this study, there are N emotion classes as an output; therefore, N parallel ANFIS structures were utilized.

In Fig. 1, a hybrid model is proposed to allow the network to dynamically respond to different input patterns and improve its performance. The models used as adaptive networks in this system are ANFIS, PNN, and RBFNN, respectively. The hybrid model for emotion recognition using EEG signals involves several steps. Firstly, adaptive networks, PNN and RBFNN, are incorporated due to their ability to learn complex patterns and handle nonlinear relationships in the data. The flexibility of these networks allows them to adjust their internal parameters to fit the training data and effectively capture the intricate relationships between EEG features and emotions. By combining the strengths of different models, such as the neuro-fuzzy ANFIS, the hybrid model can leverage the fuzzy logic capabilities of ANFIS and refine feature representations through the adaptive networks. This combination enhances prediction accuracy by capturing complex relationships and benefiting from the adaptability of the networks to adjust parameters during training.

Fig. 1
figure 1

Proposed model for emotion classification from EEG signals

The relations between the inputs and outputs to recognize emotions were analyzed using fuzzy rules, and the corresponding defuzzification equations were derived using the rules. In this study, the ANFIS consists of three and four inputs (E1, E2, E3, E4) and one output (F) for a single ANFIS in the initial layer. The mathematical expressions for the three emotions of the proposed system model in the first layer are given in Eqs. (2)–(5) [35]:

Rule m: If E1 is Ki, E2 is Lj and E3 is Ml,

$${\text{THEN}}\;F_{m} = p_{m} E_{1} + q_{m} E_{2} + r_{m} E_{3} + s_{m}$$
(2)
$$\mu_{{E_{1} }} \left( m \right) = K_{i} \left( {E_{1} } \right),\;\mu_{{E_{2} }} \left( m \right) = L_{j} \left( {E_{2} } \right),\;\mu_{{E_{3} }} \left( m \right) = M_{l} \left( {E_{3} } \right)$$
(3)
$$w_{m} = \mu_{{E_{1} }} \left( m \right) \times \mu_{{E_{2} }} \left( m \right) \times \mu_{{E_{3} }} \left( m \right)$$
(4)
$$F = \frac{{\mathop \sum \nolimits_{m = 1}^{243} w_{m} F_{m} }}{{\mathop \sum \nolimits_{m = 1}^{243} w_{m} }}$$
(5)

Here, the feature vectors, E1, E2, and E3, represent EEG signals related to emotional responses. Ki, Lj, and Nl are utilized as input fuzzy sets, \({\mu }_{{E}_{N}}\left(m\right)\) represents the membership function for fuzzy sets, \(m\) is the number of the rules (total of 243 rules), and F represents the concentration of emotions. The parameters pm, qm, rm, and sm are identified through ANFIS 1, with the fuzzy rule index denoted as m = 1, 2, …, 243 and the membership function indices denoted as i, j, l = 1, 2, 3. These mathematical expressions are repeated for ANFIS 2 and ANFIS 3.

In the second layer, the \(F\) outputs are used as inputs for the adaptive network. The mathematical expressions obtained for ANFIS method are shown in Eqs. (6)–(9).

Rule n: \(F_{1}\) is \(B_{i}\), \(F_{2}\) is \(C_{j}\), and \(F_{3}\) is \(D_{l}\)

$${\text{THEN}}\;Z_{n} = \tilde{p}_{n} F_{1} + \tilde{q}_{n} F_{2} + \tilde{r}_{n} F_{3} + \tilde{s}_{n}$$
(6)
$$\mu_{{F_{1} }} \left( n \right) = B_{i} \left( {F_{1} } \right),\;\mu_{{F_{2} }} \left( n \right) = C_{j} \left( {F_{2} } \right),\;\mu_{{F_{3} }} \left( n \right) = D_{l} \left( {F_{3} } \right)$$
(7)
$$w_{n} = \mu_{{F_{1} }} \left( n \right)\mu_{{F_{2} }} \left( n \right)\mu_{{F_{3} }} \left( n \right)$$
(8)
$$Z = \frac{{\mathop \sum \nolimits_{n = 1}^{243} w_{n} Z_{n} }}{{\mathop \sum \nolimits_{n = 1}^{243} w_{n} }}$$
(9)

Here, the feature vectors, F1, F2, and F3, represent the predicted EEG signals related to emotional responses from the first model of the system. The fuzzy sets Bi, Cj, and Dl are used as fuzzy membership functions in Eq. (7), and the value Z indicates the concentration of emotions. ANFIS determines the parameters \(\tilde{p}_{n}\), \(\tilde{q}_{n}\), \(\tilde{r}_{n}\), and \(\tilde{s}_{n}\) with the fuzzy rule index denoted as n = 1, 2, …, 243 and the membership function indices are represented as i, j, l = 1, 2, 3.

In the second layer, the mathematical expressions obtained for the PNN method are shown in Eqs. (10)–(14).

$$X_{s} = \emptyset \left( {\left| {F_{N} - c_{{\text{s}}} } \right| \times a} \right)$$
(10)
$$\emptyset \left( x \right) = \exp \left( { - x^{2} } \right)$$
(11)
$$a = \frac{0.833}{p}$$
(12)
$$T_{u} = \mathop \sum \limits_{s}^{h} W_{s, u}^{h} \times X_{s}$$
(13)
$$Z_{{\text{u}}} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & { {\text{if }} \begin{array}{*{20}l} {\,\,} \hfill \\ \end{array} \, T_{u }\ {\text{is}}\max {\text{of}}\ \left\{ {T_{1} } \right., T_{2} , \left. {T_{3} } \right\}} \hfill \\ {0,} \hfill & {{\text{else}} } \hfill \\ \end{array} } \right.$$
(14)

where s = 1, 2, …, h, u = 1, 2, 3, h represents the number of neurons in the hidden layer, which is equal to the number of distinct training patterns. Xs is the output of the radial basis layer, F denotes the input vector, \(c_{{\text{s}}}\) represents the center vector of the s-th hidden node, p is a known real constant referred to as the spread factor, a represents the biasing term of the radial basis layer, and \(\emptyset \left( . \right)\) denotes the Gaussian, which is the nonlinear radial basis function. \(Z_{{\text{u}}}\) corresponds to the u-th output, which serves as the classification index for selecting emotions. \({W}_{s,u}^{h}\) represents the weight between the s-th hidden node and the u-th output node.

Additionally, in the second layer, the mathematical expressions obtained for the RBFNN method are shown in Eqs. (15) and (16).

$$\varphi_{t} = {\text{exp}}\left( { - \frac{{\left\| {F_{t} - U_{t} } \right\|}}{{\sigma_{t}^{2} }}} \right)$$
(15)
$$Z_{p} = \mathop \sum \limits_{t = 1}^{H} \varphi_{t} \omega_{pt} + \omega_{p0}$$
(16)

where \({\varphi }_{t}\) is the output of the hidden layer of the RBFNN model, and the \({U}_{t}\) represents the reference vectors indicating the centers of the radial functions in the hidden layer. Fi is the input vector (feature vector), \(\left\| \cdot \right\|\) denotes the distance function, and \({\sigma }_{t}\) indicates the width of the Gaussian function. The output of the RBFNN to recognize emotions is calculated in Eq. (16). Here, \({\omega }_{pt}\) denotes the weights parameter. Training of RBF networks determines the center vector and width value of the radial function in the hidden layer and the weight parameters between the hidden layer and the output layer.

The datasets used in this study, the Feeling Emotions and DEAP datasets, contain three positive, negative, and neutral emotions and four emotions, happy, sad, fear, and relaxed, respectively. In these parallel models, ANFIS 1 is used for negative emotion training; while, ANFIS 2 and ANFIS 3 are designed for positive and neutral emotion training, respectively. Besides, ANFIS 1 is used for happy emotion training, and ANFIS 2 is used for sad emotion; ANFIS 3 and ANFIS 4 are designed for fear and relaxed emotions, respectively. The model begins by providing the feature vectors (E1, …, En, n = 1, …, N) for each ANFIS. The training process involves separately training N ANFIS models for N emotion classes. In the first layer, emotions are pre-processed according to their class. This simplifies the problem and grades the classes. In the second layer, the decisions of the N ANFIS are given as the feature vector of the adaptive network. Thus, the last layer performs decision fusion and gives the final decision of the parallel structure.

However, for the first layer of the proposed system model, in the feeling emotions dataset, negative emotions in the feature vector (E1) reserved for training from the dataset are initially considered class 1. In contrast, other emotions are considered class 0 (the classes of the feature vectors E2 and E3). On the other hand, for the ANFIS 2 model, positive emotions are assigned to class 1; while, other emotions are assigned to class 0, and the same process is applied to the ANFIS 3 model. In addition, the relevant emotions in the DEAP dataset training feature vector are initially classified as class 1; while, those of other types are classified as class 0.

In the second layer of the model, a new feature vector has created range of the [0–1] output values of the ANFIS structures. For the Feeling Emotions dataset, the first column of the feature vector represents negative emotions; while, the second and third columns represent positive and neutral emotions, respectively. The feature vector of the DEAP dataset also has four columns: the first column corresponds to happy emotions, the second corresponds to sad emotions, and the three and four columns correspond to fear and relaxed emotions, respectively. Furthermore, the feature vector is created and used as the input vector for the adaptive network during training. In the second layer, the output of the trained system provides information on which emotion class the input belongs to.

In order to enhance comprehension of the system, Fig. 2 illustrates the flowchart representing the proposed hybrid model. The process initiates with providing feature vectors for each ANFIS, and the subsequent training process involves independent calibration of N ANFIS models corresponding to the N emotion classes in question. In the first layer, emotions go through a feature extraction stage where they are categorized into relevant classes after splitting into training and testing datasets, simplifying the problem and facilitating subsequent class ranking. If the feature extraction process is promising, the selection of features to be trained occurs and becomes an input in the training phase of the adaptive network. The training continues until the system achieves a desired level of proficiency. The second layer combines decisions trained with N ANFIS models for the adaptive network, and these decisions are forwarded to PNN, RBFNN and ANFIS structures. This step involves utilizing the initial ANFIS outputs for further training of these structures and aims to refine the models based on the features identified during the preliminary training. In the last layer, decision fusion is arranged to precisely determine the parallel structural configuration. The final step involves scrutinizing the predictive accuracy of the system in the domain of emotion and assessing how well the system can predict emotions based on the features identified during training.

Fig. 2
figure 2

Flowchart showing the fundamental stages of implementing the proposed hybrid method

4 Results and discussion

In order to evaluate the performance of the model, the DEAP data set [32], and the Feeling Emotion data set [30, 31] are utilized in the study. In order to get the results, each classification method was trained and cross-validated 15 times to ensure the stability of the accuracy.

The success of the classifier is dependent on the distribution and distinguishability of the inputs that are chosen, thus, in designing a neural network, input selections are the most important part of the process. Hence, the knowledge of which features of the input vector best represent a particular sample has a direct impact on classifier performance. According to Fig. 3, the feature distribution graph shows that the inputs that most accurately describe the Feeling Emotion dataset utilized for EEG-based emotion recognition in this study are the first features.

Fig. 3
figure 3

Distribution of classes according to features, a 1–3 features, b 4–6 features, c 1270–1272 features, d 2546–2548 features

However, when the distributions are examined, it is seen that the most successful features for classification are the ones that are first compared to the others. Therefore, the first three features of the datasets are chosen as the input vector of the proposed hybrid model. Besides that, the proposed system model is trained to examine the effects of feature size on classifier performance using the first five features, which is equal to the maximum number of inputs in ANFIS.

The ANFIS classifiers are trained using a combination of the backpropagation gradient descent method and the least-squares technique. Additionally, the ANFIS fuzzy rule structure is designed with the generalized bell shaped membership function. Samples with Output 1, Output 2, and Output 3 target output sets are given binary target values (1, 0, 0), (0, 1, 0) and (0, 0, 1), respectively. In order to increase the classifier accuracy of the study, the chosen adaptive network, ANFIS, PNN, and RBFNN classifiers are trained with an input vector that utilizes the outputs of the three ANFIS classifiers. The dataset is first divided into two: a training dataset and a test dataset with the five-fold method. While the training set consisting of 1706 samples is used to train parallel ANFIS classifiers, the test set consisting of 426 samples is used to test the accuracy of the trained network for the Feeling Emotion dataset. This ratio for the DEAP dataset consists of 3842 sample training sets and 960 sample test sets.

The test performance of classifiers can be evaluated by calculating the classification accuracy, sensitivity, and specificity parameters. So, while classification accuracy is calculated by the ratio of the number of correct decisions to the total number of cases, sensitivity and specificity are calculated by the ratio of the number of true positive decisions to the number of actually positive cases and the number of true negative decisions to the number of actually negative cases, respectively. In this study, the system performance of the structures used is evaluated with accuracy, specificity, sensitivity, and mean square error (MSE) rate.

The parallel ANFIS structure is trained with the first five features of the Feeling Emotion and DEAP datasets in the first phase of learning. The number of membership functions in ANFIS determines the number of fuzzy rules and affects the learning speed of the network; therefore, the number of membership functions clustering the input space is chosen as (3 3 3 3 3) for five inputs. However, in the ANFIS model with three inputs used in the second part of the system, the number of membership functions (3 3 3) and (5 5 5) is also selected separately, and the effect of the membership function number on the system accuracy is examined.

In Table 2, accuracy, sensitivity, specificity, and MSE parameters for Feeling Emotion dataset the first five features of the proposed system model are given for the Feeling Emotion dataset. In the system model, in which the outputs of the parallel ANFIS networks are given as the input vector to the RBFNN, PNN, and ANFIS structures selected for the adaptive network, the highest accuracy performance is achieved at 95.89%, during which the membership function of the three-input ANFIS classifier is chosen as (3 3 3).

Table 2 Accuracy, sensitivity, specificity and MSE rates for the first five features of the feeling emotions dataset

In comparison with other networks, the RBFNN model has the lowest accuracy of 90.27 percent, whereas the PNN model is the second-most high accuracy at 94.61%. The reason is that PNN uses the training set in only one training step and decides the class at the output layer of the network by choosing the highest probability from the total output values.

According to Tables 3 and 4, the network is trained with different membership function numbers, which received training outputs from the parallel ANFIS model in the first phase in order to get accuracy, specificity, sensitivity, and MSE values, respectively. The system performance values obtained in cases where the number of inputs is reduced to three and the membership function number is selected as (5 5 5) are shown in Table 3 for Feeling Emotion dataset. Moreover, system performance improves when the number of membership functions is increased; while, the input vector is decreased for RBFNN and PNN networks (see Tables 2 and 3). That is because that all the features describing the samples have different levels of distribution; and therefore, these feature distributions are more highly classifiable in the first three features of the examined dataset. However, even though the number of inputs is reduced to three, ANFIS performance outputs maintain a similar rate of accuracy with the increase in membership functions. Because ANFIS expresses the input and output values with the membership function. Tables 3 and 4 are also examined, and thereby, it is seen that the accuracy of a low-input ANFIS system is directly affected by the variability of membership functions for feeling emotion dataset.

Table 3 Accuracy, sensitivity, specificity and MSE rates for the first three features of the Feeling Emotions dataset
Table 4 Accuracy, sensitivity, specificity and MSE rates for the first three features of the feeling emotions dataset

In Table 4, the accuracy, sensitivity, specificity and mean square error values are obtained when the first three features is trained with the membership function (3 3 3) in the parallel ANFIS classifier. Tables 2 and 4 are also examined, and it is found that decreasing the number of inputs reduces the accuracy performance of the system for RBFNN and ANFIS structures when the same membership function number is selected. Nonetheless, the accuracy of the system is 94% for the PNN network.

The present study investigates various methods to compare and validate the classification efficacy of the proposed model for categorizing three emotions. The accuracy comparison between the proposed model and existing shallow learning techniques for the three-class classification is presented in Table 5. One of the previous studies by Zhong et al. [19] employed two regularizers, NodeDAT and EmotionDL, for detecting three emotions, i.e., positive, negative, and neutral, achieving a classification accuracy of 94.24%.

Table 5 Comparison of the accuracy for three classes on the feeling emotions dataset

However, our proposed model surpasses their performance by 2% accuracy in the same three classes. Furthermore, our model outperforms the approach presented by Qing et al. [20] by 20%. In comparison with the SVM network model utilized by Li et al. [21], our proposed model exhibits better accuracy and comparable performance in the classification. As a result, our proposed model yields higher outcomes in the comparative analysis.

Figure 4 shows the feature distribution graph of the happy, sad, fear, and relax emotions of the DEAP dataset. However, when the distributions are examined, it is seen that the first features of the dataset can be more classified compared to the others. Therefore, the proposed hybrid model is trained with the first five features which is equal to the maximum number of inputs in ANFIS. Additionally, in order to maintain accuracy for the comparison, only the first five features of the dataset have been trained.

Fig. 4
figure 4

Distribution of classes according to features for DEAP dataset, a 1–3 features, b 15–17 features

Several methods are examined for comparison to verify the classification effect of the proposed model; additionally, four emotion is classified in this study. The accuracy comparison of the studies using four classes and the shallow learning techniques is shown in Table 6 with the proposed model. In this regard, Islam and Ahmad [26] proposed the utilization of two vital features, namely Wavelet Energy and Wavelet Entropy, to detect four primary emotions, namely happy, angry, sad, and relaxed. Their proposed model achieved a classification accuracy of 62.3%. However, our proposed model outperforms theirs with an accuracy that is 11% higher in the four classes. Moreover, our model surpasses the approach proposed by Zheng et al. [24] by 4% for the same four classes. In comparison with the PNN network model, which is used by Nakisha et al. [29], our model achieves better accuracy while demonstrating comparable performance in the classification of the DEAP dataset. Consequently, our proposed model achieves higher results in the comparison.

Table 6 Comparison of the accuracy for four classes on the DEAP dataset

Accuracy, sensitivity, specificity and MSE parameters for the first five features of the proposed system model are given in Table 7. In the system model, where the outputs of the first ANFIS network that are trained are in the range of 0 to 1, are given as the input vector to the RBFNN, PNN and ANFIS structures selected as the adaptive network, the highest accuracy performance is found as 73.49% when the membership function of the four-input ANFIS classifier is selected as (3 3 3 3). While the second-best accuracy was found as 72.10% with PNN, the lowest accuracy performance of 69.63% was achieved with the RBFNN model compared to other networks.

Table 7 Accuracy, sensitivity, specificity and MSE rates for the first five features of the DEAP dataset

5 Conclusions

This study aimed to develop a hybrid system model for emotion recognition from EEG signals and evaluate its performance using ANFIS models and adaptive networks. The study achieved its objectives and made several key contributions. Firstly, it confirmed that ANFIS structures outperform in classifying emotions from EEG signals. In addition, the study introduced a novel hybrid system model that combines ANFIS with adaptive networks, providing a promising approach to EEG-based emotion classification. Studies show that the proposed model has high accuracy in recognizing happy, sad, fearful, and relaxed emotions from EEG signals, as well as positive, negative, and neutral emotions.

Future work can build on these results by exploring the use of other physiological signals in emotion recognition, such as facial expressions and heart rate variability. Additionally, incorporating other machine learning techniques, such as deep learning algorithms, could potentially further improve the accuracy of emotion recognition.