Introduction

Over the last two centuries, there has been a lot of biological sciences research driven by the demand for new healthcare treatments and ongoing efforts to understand the biological underpinnings of illnesses [1, 2]. Recent developments in the life sciences have made it possible to investigate biological systems holistically and to gain access to the molecular minutiae of living things like never before. Nevertheless, it is extremely difficult to draw meaningful conclusions from such data due to the presence of inherent complexity of biological systems along with larger dimension, variety, and noise present in them [3]. As a result, new equipment that are accurate, dependable, durable, and capable of processing large amounts of biological data are needed. This has inspired many researchers in the life and computer sciences to adopt an interdisciplinary strategy to clarify the workings and dynamics of living things, with notable advancements in biological and biomedical research [4]. As a result, numerous artificial intelligence approaches, particularly machine learning, have been put forth over time to make it easier to identify, categorise, and forecast patterns in biological data [5].

 Deep learning, a subset of machine learning, extracts more meaningful and complementary features from a large training dataset primarily without human intervention. Learning data representations by introducing increasingly complex degrees of abstraction is the fundamental idea behind deep learning. Nearly all levels work on the principle that more abstract representations at a high level are defined with respect to lesser abstract representations seen at low levels [6]. Because it enables a system to understand and learn complicated representations straight from the raw data, this kind of hierarchical learning process is particularly potent and useful in a wide range of disciplines [7, 8].

Emotions play a vital role in our daily lives. Hence, emotion recognition is an important part of human–computer interactions. Emotion can be recognised from speech, facial expressions, and physiological signals. However, emotion recognition from physiological signals is most reliable as humans can deliberately conceal or fake their emotional expression through speech or gesture [9, 10]. Hence, emotion recognition from EEG signals has drawn the attention of many researchers over the past decade.

Our study has classified two dimensions of emotion: valence and arousal. These are the two most important parameters for describing human emotions. Valence indicates whether an emotion is positive or negative. Arousal indicates the level of arousal developed in our bodies due to emotion. Figure 1 categorises some common emotions according to their valence and arousal values.

Fig. 1
figure 1

Two-dimensional valence-arousal space

Electroencephalogram (EEG) is a medical procedure during which electrodes with thin wires are pasted on our scalp and detect voltage fluctuations in brain neurons. These EEG signals capture the electrical activities of our brain and can be used to detect human emotions [10]. As mentioned earlier, emotion recognition from physiological signals is more reliable and accurate. Moreover, EEG can continuously detect changes in human emotions. Hence, it can be used for patient monitoring [11, 12]. In our study, we have analysed EEG signals to predict human emotions accurately. The steps involved in the process of EEG-based emotion recognition are represented in Fig. 2.

Fig. 2
figure 2

Pipeline for emotion recognition from EEG signals

There are two ways to perform EEG-based emotion recognition: subject-dependent and subject-independent approaches [13]. The subject-dependent approach generally gives higher accuracy than the independent approach, but the former requires the model to be trained for each subject [14]. In our study, we have implemented both subject-dependent and independent approaches.

The main contributions of the work can thus be summarised as follows:

  • This study proposes a fuzzy ensemble approach for emotion recognition from EEG signals. This fuzzy-based approach has been applied for the first time in this domain, achieving remarkable accuracy.

  • The proposed model has been tested on two standard benchmark datasets: DEAP [15] and AMIGOS [16]. Both are benchmark datasets in the realm of EEG-based emotion recognition. It is to be noted that AMIGOS is the largest existing dataset in this domain.

  • The model also gives a satisfactory performance in both subject-dependent and subject-independent approaches. Hence, the model is flexible and can be applied to either approach as the situation demands.

  • The proposed model also shows impressive classification accuracy when tested for both valence and arousal dimensions.

The rest of the paper has been organised as follows: the literature analysis related to EEG-based emotion recognition has been discussed in the “Related Works” section, whereas the benchmark datasets used for our experiment are described in the “Dataset” section. The proposed fuzzy ensemble-based deep learning methodology is presented in the “Methodology” section, whereas the overall results obtained by the proposed model are explained in the “Results” section. Lastly, the conclusion followed by some future scope is mentioned in the “Conclusion” section.

Related Works

Many research works have been conducted on EEG-based emotion recognition over the past years. Initially, supervised machine learning algorithms were implemented. But later, the focus shifted to mostly deep learning-based approaches to obtain state-of-the-art accuracies.

Yoon and Chung [17] used fast Fourier transform features and implemented a Bayesian function and perception convergence algorithm. They achieved an accuracy of 70.9% for the valence dimension on the DEAP dataset [15]. Dabas et al. [18] implemented machine learning models like support vector machine (SVM) and Naïve Bayes and obtained an accuracy of 58.90% and 78.06% on the DEAP dataset [15].

Liu et al. [19] performed emotion recognition on the DEAP dataset. They used different types of features like time-domain features such as mean and standard deviation, frequency-domain features like power spectral density (PSD), and time–frequency domain features like discrete wavelet transform (DWT). They used the random forest and K-nearest neighbour (KNN) model and achieved an accuracy of 66.17% for arousal. You and Liu [20] extracted time-domain features from a 5-s slice of EEG signals and implemented an autoencoder neural network. They achieved an accuracy greater than 80% on the DEAP dataset. Salama et al. [21] implemented a 3D-convolutional neural network (CNN) model to recognise emotion in the DEAP dataset. They achieved an accuracy of 87.44% and 88.49% for valence and arousal dimensions, respectively.

By implementing shallow depth-wise parallel CNN, Zhan et al. [22] achieved an accuracy of 84.07% and 82.95% on arousal and valence, respectively, on the DEAP dataset. Allghary et al. [23] achieved an accuracy of 85.65%, 85.45%, and 87.99% on arousal, valence, and liking, respectively. They proposed an LSTM model for emotion classification on the DEAP dataset. Wichakam et al. [24] used the band power feature and SVM model to classify emotion on the DEAP dataset. They reached an accuracy of 64.9% for valence and 66.8% for liking. They selected 10 channels for the recognition task and demonstrated that increasing the number of channels to 32 does not improve the performance. Parui et al. [25] proposed the XGBoost classifier model. They extracted several features from the EEG signals from the DEAP dataset and optimised them. The model achieved accuracies of 75.97%, 74.206%, 75.234%, and 76.424% for the four dimensions, respectively.

Aggarwal et al. [26] combined XGBoost and LightGBM models for emotion recognition on the DEAP dataset. They achieved an accuracy of 77.1% for the valence dimension. Bagzir et al. [27] decomposed EEG signals into gamma, beta, alpha, and theta bands by applying DWT to extract the frequency spectrum characteristics of each frequency band. Then, a KNN, an SVM, and an artificial neural network were used for classification. The model achieved accuracies of 91.1% and 91.3% on valence and arousal, respectively.

Few recent works have developed robust models and tested them on multiple datasets. Siddharth et al. [28] developed a multi-modal emotion recognition model by fusing the features from different modalities and then classifying them. They have evaluated their model on multiple datasets like DEAP [15], AMIGOS [16], MAHNOB-HCI [29], and DREAMER [30] datasets. Ante Topic et al. [31] used holographic feature maps; they also selected optimal channels by implementing ReliefF and neighbourhood component analysis (NCA). The holographic feature maps were fed as input to CNN, and finally, the output of CNN was passed to the SVM classifier. They have evaluated their model on DEAP [15], AMIGOS [16], SEED [32], and DREAMER [30] datasets. They have got highest accuracies on the DREAMER dataset, which is 90.76%, 92.92%, and 92.97%, respectively, on valence, arousal, and dominance dimensions. Singh et al. [33] extracted spectrogram features from 14 EEG channels and used a CNN model to classify emotions on the AMIGOS dataset. The model achieved 87.5% and 75% accuracy on valence and arousal dimensions. Garg et al. [34] used FFT and wavelet transform to extract features from EEG signals and implemented a deep neural network (DNN) model for emotion recognition. On the AMIGOS dataset, the method achieved 85.47%, 81.87%, 84.04%, and 86.63% for valence, arousal, dominance, and liking. Zhao et al. [35] used a 3D CNN model to recognise emotions from EEG signals on DEAP and AMIGOS datasets. They evaluated two-class (low/high arousal, low/high valence) and four-class (HAHV, HALV, LAHV, LALV) classifications. The model achieved 96.61%, 96.43% for the two-class classification task; 93.53% for the four-class classification on the DEAP dataset; 97.52%, 96.96% for the two-class classification task; and 95.86% for the four-class classification task on the AMIGOS dataset.

Motivation and Research Gap

Over the past years, the research works have notably improved performance/accuracy for the task of EEG-based emotion recognition [36, 37]. However, most of the existing works have a few common drawbacks. Firstly, most models have been tested on a particular dataset only. Such models can be data-dependent and may not be robust. Secondly, almost all existing works have been performed in either subject-independent or subject-dependent conditions. There is a lack of research work that focuses on both approaches simultaneously. Thirdly, most of the work done for EEG-based emotion recognition focuses on machine learning-based models. Some of the researchers have developed customized deep learning models for solving this problem. However, these models are simplistic such that they are unable to deal with the complexity of the problem resulting in low classification accuracy. Keeping in mind the above mentioned gap, this work motivates to propose a fuzzy ensemble-based deep learning model (by ensembling three different complementary deep learning models such as a hybrid of CNN and LSTM models, a hybrid of CNN and GRU models, and ID-CNN model) for solving EEG-based emotion recognition problem. It is to be noted that this fuzzy-based approach has been applied for the first time in this domain. The work also attains satisfactory classification results for AMIGOS dataset, a benchmark largest dataset in this domain. The most highlighting aspect of this work is that the proposed work achieves impressive results for both subject-independent as well as subject-dependent cases. Additionally, the work has also been tested on both valence and arousal dimensions.

Datasets

The proposed model has been implemented on two datasets — DEAP [15] and AMIGOS [16], both developed by the Queen Mary University of London, UK.

The DEAP dataset consists of EEG signals of 32 subjects. Each subject watched 40 music videos of 1-min duration containing different emotions, while their EEG was recorded. Each subject also rated the level of valence, arousal, liking, and dominance for each video. The data is stored in 32 files, one for each participant. Each file consists of a total of 40 channels, out of which 32 channels contain EEG data. The data is pre-processed and available in both Python (.dat) and MATLAB (.mat) formats. Our experiment is conducted using the .dat files. The dataset also contains frontal face video recordings for 22 subjects which can be used for the multi-modal emotion recognition task.

The AMIGOS dataset can be used for a multi-modal study of mood and affective responses of individuals to different videos. The dataset consists of EEG, electrocardiogram (ECG), and galvanic skin response (GSR) recordings of 40 participants while they watched 16 videos. The experiment was performed both individually and in groups. Each subject assessed different parameters like valence, arousal, and familiarity. Video recordings of frontal entire body and depth are also available. The EEG signals are pre-processed and available in both Python and MATLAB format. For our experiment, we have used the python files.

Methodology

Datasets

This work has utilised two standard benchmark datasets to validate the proposed method. These datasets are described in the following subsections.

DEAP Dataset

The dataset contains EEG recordings of 32 channels, of which 14 channels were selected for our experiment. The selected channels are Fp1, AF3, F3, F7, T7, P7, Pz, O2, P4, P8, CP6, FC6, AF4, and Fz [38]. Valence and arousal dimensions have been considered for our study. The labels of both the dimensions had continuous values between ‘1’ and ‘9’. The labels for each dimension were categorised into two classes — high and low. We used to label ‘1’ for high and label ‘0’ for low. So, any value below ‘5’ has been labelled ‘0’, and any value above ‘5’ has been assigned label ‘1’.

Pre-processing

Fast Fourier transform (FFT) has been implemented to extract features from the EEG signals. Several studies have shown that FFT gives better performance than traditional feature extraction methods [39, 40]. FFT transforms a signal from the time domain to the frequency domain. Since there is a probability of quick detection of emotions, the raw signals were segmented into 2-s temporal windows with 1-s overlap. The FFT is computed for each such segment of raw data. The frequency bands considered in the present work are 4–8 kHz, 8–12 kHz, 12–16 kHz, 16–25 kHz, and 25–45 kHz.

AMIGOS Dataset

The dataset contains EEG recordings of 40 subjects, out of which some subjects did not participate in both short and long video experiments. To maintain consistency in the data, we have chosen only those subjects who have watched both long and short videos. Out of 17 channels, we have selected 14 channels that contained EEG data– AF3, F3, F7, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4.

Pre-processing

The pre-processing is done similarly as the DEAP dataset. Raw data is broken into segments, and the FFT features are extracted with the same frequency bands used for the DEAP dataset. The valence and arousal labels are classified into two classes. If the value is greater than ‘5’, the assigned label is ‘1’; else ‘0’.

Candidate Models

Hybrid of CNN and LSTM Models

Our first candidate model is a hybrid of CNN and LSTM models. The CNN layers extract spatial features from the signals, and the LSTM part extracts temporal features. Fully connected layers follow the LSTM layers, and then the final prediction is obtained. All the model hyperparameters have been summarised in Table 1.

Table 1 Hyperparameters of the first candidate model

Hybrid of CNN and GRU Models

The second model implemented, in this work, is similar to the first one; however, GRU layers have been used in place of LSTM layers. The GRU has a simplified structure than the LSTM model, and it usually takes lesser time to train. All the model parameters have been summarised in Table 2.

Table 2 Hyperparameters of the second candidate model

1D-CNN Model

The third model consists of 1D-CNN layers followed by fully connected layers. Numerous researches have proven that CNN is very effective in feature extraction tasks from images and other data. Hence, we have chosen CNN as our feature extraction, followed by dense layers to get the predictions. All the model parameters have been summarised in Table 3.

Table 3 Hyperparameters of the third candidate model

Proposed Model

This paper proposes an ensemble learning approach for emotion detection from EEG signals. We have trained three individual models and combined them using the fuzzy ensemble technique and max voting ensemble. Each model has been trained individually for 50 epochs for both datasets before combining them. For all the models, adam optimiser has been used with a learning rate of 0.001. For the subject-independent approach, the entire data is split into train, test, and validation set in the ratio of 60:20:20. We have also tried with other ratios, but this was the optimal split; hence, the 60:20:20 ratio has been finalised. For the subject-dependent approach, the model is trained and tested separately for each subject, and then we take the average of the results. For each subject, the data was split into train, test, and validation sets in the ratio 60:20:20. The proposed model is illustrated in Fig. 3.

Fig. 3
figure 3

Schematic diagram of our proposed ensemble learning approach for emotion detection from EEG signals

Fuzzy Ensemble Using Gompertz Function

The proposed ensemble method generates fuzzy ranks of the different models using the Gompertz function. It fuses the decision scores adaptively from those models to make the combined prediction on the test set. In the hard voting ensemble, all the models are given the same priority, which can be a disadvantage if there is a weak classifier. However, this disadvantage is overcome to some extent in this fuzzy approach as weightage is assigned dynamically based on the confidence measure. This approach has been applied to other problems in previous studies [41,42,43,44]. The Gompertz function has been applied extensively for studying COVID-19 in recent years [41, 45,46,47]. In our study, we have implemented the re-parameterised Gompertz function for modelling our fuzzy ensemble.

Algorithm:

For each model, we get individual confidence (\(c\)), which is normalised to get a normalised confidence value.

Let there be \(x\) candidate models and \(n\) number of classes. In our case, \(x\) is 3, and \(n\) is 2.

$$\sum\nolimits_\mathrm{i=1}^\mathrm{n}\mathrm{{c}_{i}^{j}}=1.0 \;\;\;\;\;\forall \mathrm{j}\in \mathrm{1,2},..\mathrm{x}$$
(1)

These confidence scores are used for calculating the fuzzy rank using the Gompertz function.

$$\mathrm{{r}_{i}^{j}}=1-\mathrm{exp}\{-\mathrm{exp\;exp} \left(-2.0*\mathrm{{c}_{i}^{j}}\right)\; \forall \mathrm{i}\in \mathrm{1,2}..,\mathrm{n},\forall \mathrm{j}\in \mathrm{1,2},..,\mathrm{x}$$
(2)

Lower the value of rank indicates better confidence scores. Let \({M}^{j}\) denote the top M ranks for a particular class \(c\). If the rank does not belong to the top M ranks, then two penalty values are calculated, \({P1}_{i}^{j}\) and P2. P1 is calculated by putting \({c}_{i}^{j}=0\) in Eq. (2), and P2 is 0. Next, we calculate two more factors, the rank-sum (\({S}^{j}\)) and the complement of confidence score factor (\({F}^{j}\)) in the following way:

$$\mathrm{{S}^{j}}= \sum_{\mathrm{j}=1}^{\mathrm{x}}{\mathrm{r}}_{\mathrm{i}}^{j}\quad{\mathrm{if}}\quad{\mathrm{r}}_{\mathrm{i}}^{\mathrm{j}}\in\mathrm{{M^j}}$$
$${\mathrm{S}}^{\mathrm{j}}=\sum\nolimits_{\mathrm{j}=1}^{\mathrm{x}}{\mathrm{P}1}_{\mathrm{i}}^{\mathrm{r}} \quad\mathrm{otherwise}$$
(3)
$$\mathrm{{F}^{j}}= {^1/{x}}\sum_\mathrm{{j=1}}^\mathrm{{x}}\mathrm{{c}_{i}^{j} \quad if \quad{r}_{i}^{j}\in {M}^{j}}$$
$$\mathrm{{F}^{j}}={^1/{x}}\sum\nolimits_{j=1}^{x}\mathrm{{P2}_{i}^{c}} \quad\mathrm{otherwise}$$
(4)

Here, \({P1}_{i }^{r}\) and \({P2}_{i}^{c}\) are defined as the penalty terms imposed on pattern class \(i\), if it does not belong to the top \(M\) class ranks. The final score (\({SC}^{j})\) is the product of \({\mathrm{S}}^{\mathrm{j}}\) and \({\mathrm{F}}^{\mathrm{j}}\)

$$\mathrm{SC}^{j}=\mathrm{{S}^{j}\times {F}^{j}}$$
(5)

Finally, the resultant class (\(c\)) is the class with a minimum \({SC}^{j}\) value which gives the final decision score of the proposed ensemble model.

Results

All programs have been run on the Google Colab platform. The GPU utilised for running the programs is Tesla T4 provided by the platform. The performance metrics used for evaluating our model are accuracy and f1-score; they have been defined as follows:

$$\mathrm{Accuracy}=\frac{\mathrm{True \;Postive + True \;Negative}}{\mathrm{True \;Postive+True \;Negative+False \;Positive+False \;Negative}}$$
(6)
$$\mathrm{F}1-\mathrm{score}=\frac{\mathrm{{Recall}^{-1} + {Precision}}^{-1}}{2}$$
(7)

DEAP Dataset

Subject-Independent Approach to DEAP Dataset

The evaluation metrics for different models on the DEAP dataset for the subject-independent approach have been summarised in Table 4. It can be seen that the ensemble model outperforms the individual models.

Table 4 Comparison of performance of different models for the subject-independent approach on the DEAP dataset

Valence Dimension

The variation of training and validation accuracies with epochs for the different models has been shown in Fig. 4. It can be observed from the following curves that both training and validation accuracies increase with epoch rapidly in the beginning, then the rate of increase decreases, and it almost flattens near 200 epochs, which indicates that the model has been trained.

Fig. 4
figure 4

Graphs showing the variation of training as well as validation accuracies with different epoch sizes for valence dimension on DEAP dataset for a model 1 (1D CNN + LSTM model), b model 2 (1D CNN + GRU model), and c model 3 (1D CNN only)

The confusion matrix obtained by the proposed fuzzy ensemble model is represented in Fig. 5. From Fig. 5, it can be seen that the ratio of wrong predictions to correct predictions is approximately 10% for both high and low valence.

Fig. 5
figure 5

Confusion matrix obtained from the proposed fuzzy ensemble model for valence dimension on DEAP dataset

Arousal Dimension

The variation of training and validation accuracies with epochs for the three candidate models have been shown in Fig. 6. It can be observed from Fig. 6 that the accuracy has become more or less stable near 200 epochs which indicates the training has been completed.

Fig. 6
figure 6

Graphs showing the variation of training as well as validation accuracies with different epoch sizes for arousal dimension on DEAP dataset for a model 1 (1D CNN + LSTM model), b model 2 (1D CNN + GRU model), and c model 3 (1D CNN only)

The confusion matrix obtained by the proposed fuzzy ensemble model is represented in Fig. 7. From Fig. 7, it can be seen that the misclassification rate is slightly higher for low arousal as compared to high arousal.

Fig. 7
figure 7

Confusion matrix produced by the proposed fuzzy ensemble model for arousal dimension on DEAP dataset

Subject-Dependent Approach to DEAP Dataset

For the subject-dependent approach, we have taken the average of our test results for each of 32 subjects, and those average test results for the different models have been presented in Table 5. In this approach, the ensemble model outperforms the individual models for both the arousal and valence dimensions.

Table 5 Comparison of performance of different models for the subject-dependent approach on the DEAP dataset

AMIGOS

Subject-Independent Approach

The evaluation metrics for different models on the DEAP dataset for the subject-independent approach have been summarised in Table 6. The ensemble model surpasses the accuracies achieved by the individual models, and we have obtained state-of-the-art accuracies for both valence and arousal dimensions.

Table 6 Comparison of performance of different models for the subject-independent approach on the AMIGOS dataset

Valence Dimension

The variation of training and validation accuracies with epochs for the three different candidate models has been shown in Fig. 8. From the graphs shown in Fig. 8, it can be observed that the accuracy increases sharply initially, and then the curve becomes flat near 50 epochs. Hence, the models are not trained for further epochs.

Fig. 8
figure 8

Graphs showing the variation of training as well as validation accuracies with different epoch sizes for valence dimension on AMIGOS dataset for a model 1 (1D CNN + LSTM model), b model 2 (1D CNN + GRU model), and c model 3 (1D CNN only)

The confusion matrix outputted by the proposed fuzzy ensemble model for valence dimension is represented in Fig. 9. It can be observed from Fig. 9 that the ratio of incorrect predictions to correct predictions for low valence is around 0.024 and that of high arousal is about 0.006.

Fig. 9
figure 9

Confusion matrix obtained by the proposed fuzzy ensemble model for valence dimension on AMIGOS dataset

Arousal Dimension

The variation of training and validation accuracies with epochs for the different candidate models has been shown in Fig. 10. The training and validation accuracies reached a more or less constant value at 50 epochs for each model; hence the models were trained for 50 epochs only.

Fig. 10
figure 10

Graphs showing the variation of training as well as validation accuracies with different epoch sizes for arousal dimension on AMIGOS dataset for a model 1 (1D CNN + LSTM model), b model 2 (1D CNN + GRU model), and c model 3 (1D CNN only)

The confusion matrix produced by the proposed fuzzy ensemble model for arousal dimension is represented in Fig. 11. The misclassification percentage for low arousal is approximately 1.5% and that of high arousal is approximately 1.7%.

Fig. 11
figure 11

Confusion matrix obtained by the proposed fuzzy ensemble model for arousal dimension on AMIGOS dataset

Subject-Dependent Approach

For the subject-dependent approach, we have taken the average of our test results for each of 32 subjects, and those average test results for the different models have been presented in Table 7. It is evident from Table 7 that the proposed fuzzy ensemble-based deep learning model has given better results in case of both valence as well as arousal dimensions.

Table 7 Comparison of performance of different models for the subject-dependent approach on the AMIGOS dataset

Comparison to Existing Works

The performance of our model has been compared to previous models in Table 8. It can be observed from Table 8 that our proposed fuzzy ensemble-based deep learning model has outperformed almost all the existing models for both DEAP and AMIGOS datasets.

Table 8 Performance comparison of our fuzzy ensemble-based deep learning model with some state-of-the-art deep learning models proposed by previous researchers

Conclusion

This paper proposes a fuzzy ensemble-based deep learning approach to classify emotion from EEG signals. Emotion recognition from EEG signals is a very challenging task [48], and getting accurate predictions is crucial as it has applications in the medical domain [49, 50]. The proposed model has achieved state-of-the-art results on the benchmark AMIGOS dataset, which is the largest dataset in this domain. The proposed model achieved accuracies of 98.73% and 98.39% on the valence and arousal dimensions, respectively, for the subject-independent setup, while for the subject-dependent setup, the accuracies attained are 99.38% and 98.66%, respectively, on the valence and arousal dimensions. Our model has also achieved satisfactory results for both subject-dependent and subject-independent approaches on the standard DEAP dataset. For the subject-independent approach, we have obtained accuracies of 90.84% and 91.72%, respectively, on the valence and arousal dimensions. For the subject-dependent approach, the accuracies obtained are 95.78% and 95.97% on the valence and arousal dimensions, respectively. It is to be noted that the running time of the proposed fuzzy ensemble-based model are found to be approximately 446 s for the DEAP dataset and 759 s for the AMIGOS dataset. This proves that our proposed model produces results while utilising significantly lesser time. It is true that the Gompertz function is difficult and mathematically expensive as we need to calculate the fuzzy measure for each individual candidate model as well as groups of models. Sometimes, ensembling is also found to be expensive in terms of both time and space. Even ensemble methods reduce the model interpretability due to increased complexity. However, the proposed ensemble model produces impressive accuracies for both subject-dependent and subject-independent cases. This is the most highlighting part of the present approach. One important limitation of the present work is that our proposed model has been designed for inputs taken from EEG signals only and so the model may not perform well for multi-modal inputs.

In the future, we would like to develop a multi-modal model for emotion recognition by combining EEG signals and video recording as inputs. Due to resource constraints, we could not develop the multi-modal model in this paper. Also, future work could focus on converting EEG signals to the image domain and then training classifiers for emotion recognition. This will reduce the input size, and hence model efficiency will increase.