1 Introduction

Mental stress is a widespread inexorable problem faced by people worldwide, regardless of their ethnicity, age, religion, and gender [1, 2]. The Global Organization for Stress reported mental stress as the number one health problem among high school students. In addition, it was stated by the American Institute of Stress that 48% of people have difficulty sleeping due to feeling stressed [3]. Stress is a condition that affects and limits the person's ability to function and disrupts the daily routine. In psychology, stress is defined as a process of two consecutive phases: the perception of a stressor or a situation and the body’s response to it [4, 5]. Stress can be triggered when a person encounters adverse stimuli, including mental (tasks that test cognitive capacity or rapidly changing tasks), physical (such as sleep deprivation or painful stimuli), or emotional stressors (such as emotional videos). These stressors can be either internal, depending on the individual’s personality, thinking, and perception, or external, like financial debt or relationship problems [7]. The body’s response to a stimulus is known as the stress response, where the body elicits a pattern of behavioral, cognitive, and affective responses to cope with the situation [6].

Mental stress is categorized into two main types: acute and chronic stress. When a person is exposed to a stressor for a short duration, such as having a job interview or public speaking, this results in acute stress. Several changes in the physiological processes of the body occur for the person to cope with the current, stressful situation. This includes the release of stress hormones such as adrenaline, noradrenaline, and cortisol that supply the body with instant energy. Later, the parasympathetic nervous system aids in regulating the body back to homeostasis (normal body condition), without any significant harm [8]. On the other hand, frequent and long-term exposure to stressors, including bad relationships, stressful jobs, and poor sleep habits, may result in chronic stress. Continuous and permanent exposure to stressful situations has deleterious consequences, affecting the mental and physical health of an individual [9].

Stress is a major threat that leads to a wide variety of health problems [10]. Several studies have shown that mental stress contributes to several diseases such as hypertension [11], stroke [12], coronary artery disease [13], cardiac arrest [14], exhaustion of the muscular system, and persistent pain [15], and psychological disorders such as anxiety and depression [16]. According to [17], around 35% of somatic symptoms patients face are unexplained with regard to any physical cause. It was discussed that stress might be the leading cause of the increased symptoms, thus stating that stress is considered dangerous for one’s health. For the aforementioned reasons, researchers have proposed and developed several ways of assessing stress levels early on to prevent the harmful health consequences associated with it.

Mental stress influences the sympathetic nervous system which is under the control of the autonomic nervous system, which affects the body in a psychological, behavioral, and physiological manner [18]. Clinicians and psychiatrists tend to evaluate the mental stress of an individual by assessing the psychological effect of stress with the help of self-report questionnaires. These are the most prevalent approaches for evaluation. Several questionnaires have been implemented to assess stress, including the perceived stress scale [19,20,21], daily stress inventory, and relative stress scale. Nevertheless, there are debates concerning their use, as they are subjective and prone to invalid answers and errors due to social desirability bias and response bias [22]. Alternatively, behavioral responses which can be visual, vocal, or nonverbal indications such as rapid eye movement and body gestures have been used to assess stress [23]. However, this behavior can be altered under conscious control.

Physiological changes in the body, on the contrary, are involuntary in the sense that it is influenced directly by the autonomic nervous system [23]. This can provide an objective means of evaluating mental stress, compared to the methods stated before. Physiological measurements include pupil diameter [24], skin temperature [25], eye gaze [26], voice [27], heart rate variability [28], blood volume pressure [29], and electrodermal conductance [30]. Note that, these measurements are bound by limitations as they are influenced by many factors other than mental stress, such as the person’s health and environmental conditions. For example, electrodermal conductance is highly sensitive to skin diseases and environmental weather conditions such as humidity and temperature [31, 32]. The cortisol level was also noticed to be highly variable, as it is easily influenced by several factors such as the circadian rhythm (i.e., fluctuating during the day), physical activity level, eating, specific medications, or certain diseases [33, 34].

In addition, many researchers have been implementing neuroimaging techniques to evaluate mental stress directly or indirectly. These methods include functional near-infrared spectroscopy [35,36,37], electroencephalography (EEG) [38,39,40,41], positron emission tomography [42, 43] and functional magnetic resonance imaging [44, 45]. From the methods mentioned above, EEG is the most prevalent technique used to study the brain’s condition and function, for clinical application as well as research studies. The main advantages of EEG over other neuroimaging techniques are its high temporal resolution, its modest set-up cost, and its simplicity of use. EEG is a non-invasive method (i.e., does not require surgery), that is used to measure and record oscillations generated by the electrical activity of the brain by placing electrodes over the scalp [46]. EEG signal recordings have a peak-to-peak amplitude of no more than 100 µV. Any recorded signal of a higher amplitude is an artifact. There are standard EEG frequency bands, where each corresponds to a specific mental state: delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and gamma (> 30 Hz) [47].

To perform mental stress assessment using EEG modality, there are two major steps to undertake: feature extraction and selection, followed by stress classification. Features extracted from EEG can be roughly categorized into 3 main types: time-domain (also known as temporal features), frequency domain (also known as spectral features), and statistical features. These features are then fed to several types of machine learning classifiers to assess the level of stress of an individual.

Machine learning is a domain of artificial intelligence that uses past data patterns and trends to make predictions about future data. Machine learning offers a variety of algorithms, including Decision Trees, Polynomial Classifiers, Random Forests (RF), Support Vector Machines (SVMs), Naive Bayes (NB), boosted classifiers, and more. For detecting mental stress from biological signals, the most common machine-learning techniques include K-nearest neighbors (KNN), Logistic regression, SVM, and Random Forest [48]. SVM [49], Logistic Regression (LR) [50], Naive Bayes [51], KNN [52], and Linear discriminant analysis (LDA) [53] were the most significant when dealing with EEG signals specifically. However, the most critical task in traditional machine learning is feature selection, which has a great effect on the classification results [54]. For instance, Saeed et al. [55] reported an accuracy of 65.96% when applying KNN with combined features such as alpha asymmetry, beta, and gamma waves. On the other hand, Darzi et al. [56] achieved an accuracy of 90.0% when power spectrum density, laterality index, correlation coefficient, and phase-slope index were used as features to the KNN classifier. In the recent past, we have seen an increasing trend in the use of deep learning architectures for evaluating mental stress [57]. Not only does it eliminate the need for feature selection, which is time-consuming, but it also does not require human intervention or prior expert knowledge [58]. Deep learning also offers several models, such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), and Long Short-Term Memory networks (LSTMs) [59]. This paper aims to fill a knowledge gap by reviewing the different EEG-related deep learning algorithms with a focus on Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) for the evaluation of mental stress. In evaluating mental stress using EEG, CNNs, and LSTMs are the most used deep learning models, thus this review paper will mainly focus on them.

The results of various research studies focusing on the application of deep learning classification methods to EEG signals for assessing mental stress have yielded inconsistent findings. For instance, the same deep learning technique, such as CNN, has produced a wide range of accuracy outcomes across different studies. For instance, raw EEG data produced classification accuracy around 62% while using spectral and topographical representation produced up to 88%. In light of this challenge, our work endeavors to conduct a comprehensive review of published papers related to mental stress assessment and classification using deep learning architectures. Our goal is to not only assess the existing body of research but also to propose potential avenues for future investigations.

While prior studies have commendably provided comprehensive reviews of EEG signal classification through machine learning techniques [31, 60], this paper distinguishes itself as the pioneering effort to deliver a dedicated review centered solely on the application of deep learning methodologies for EEG-based mental stress assessment. This endeavor is of profound significance, as it holds the potential to significantly advance our understanding of the neural underpinnings of mental stress, while also shedding light on the analytical intricacies associated with the fusion of both machine and deep learning techniques. Within the scope of this review, we meticulously scrutinize each paper's model architecture, parameter configurations, and their compatibility with the specific EEG input representations employed, emphasizing the pivotal role that deep learning plays in this critical domain of research.

In summary, our paper represents pioneering research as it explores the relationship between various categories of deep learning methods and the most suitable EEG input formulations. Additionally, we offer recommendations concerning the most common and optimal layer counts and the choice of activation functions based on our comprehensive analysis of the reviewed literature. Lastly, our work puts forth suggestions for future research directions, including the exploration of hybrid models and other innovative approaches. This review promises to be of great interest in the field of stress detection, offering valuable insights and recommendations to researchers.

Thus, the contributions of this review paper can be summarized as follows:

  1. 1.

    This review paper emphasizes the application of Deep Learning (DL) techniques for classifying mental stress using EEG data, distinguishing it from the conventional Machine Learning (ML) approaches.

  2. 2.

    The review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, employing strict criteria for including and excluding studies.

  3. 3.

    A bias risk analysis was also performed to evaluate the quality and reliability of the included studies.

  4. 4.

    The paper explored various DL model architectures and their associated EEG input formulation, providing a comprehensive understanding.

  5. 5.

    Based on the findings from the review and analysis, recommendations are provided, guiding researchers in selecting DL models and suitable EEG input formulations for optimal performance.

  6. 6.

    The review identifies critical knowledge gaps within the existing literature and suggests areas for future research and advancements in the field of EEG-based mental stress classification using DL.

The rest of the paper is organized as follows. The materials and methods are described in Sect. 2, where the search strategy, the inclusion, and exclusion strategy, as well as the variables of interest, are stated. EEG signal extraction and pre-processing are presented in Sect. 3. Section 4 provides theoretical background knowledge about deep learning as well as an overview of CNN and LSTM basic architecture models. Section 5 reviews different CNN and LSTM models, as well as hybrid models, that have been used in quantifying stress levels. The discussion of the findings of the reviewed papers is described in Sect. 6. Finally, Sects. 7, 8 summarize the main challenges and conclusions of the research in EEG-based stress estimation.

2 Materials and methods

2.1 Search strategy

This review was conducted using the Preferred Reporting Items for Systemic Reviews and Meta-Analyses (PRISMA). Different databases were used to search for publications, such as Google Scholar, PubMed, Science Direct, and IEEE Xplore. The following combinations of keywords were used:

  • ‘EEG’ AND ‘deep learning’

  • ‘EEG’ AND ‘deep learning’ AND ‘mental stress’ OR ‘mental workload’

  • ‘EEG’ AND ‘mental stress’ OR ‘mental workload’ AND (‘CNN’ OR ‘convolution’ OR ‘deep learning’)

  • ‘EEG’ AND ‘mental stress’ OR ‘mental workload’ AND (‘LSTM’ OR ‘deep learning’)

  • ‘EEG’ AND ‘deep learning’ AND ‘mental stress’ OR ‘mental workload’ AND (‘LSTM’ OR ‘neural network’ OR ‘CNN’ OR ‘convolution’).

In addition, the list of citations and references of each paper was looked over to check for any relevant studies that could be used. Figure 1 shows the search technique used.

Fig. 1
figure 1

PRISMA study selection flow chart diagram

2.2 Inclusion and exclusion strategy

Duplicates across two databases were eliminated, as were studies that did not meet the following inclusion criteria:

  • This review only included neural networks having at least two hidden layers to be considered as deep learning.

  • Only studies employing deep learning models for classification were included; those using alternative classifiers such as traditional machine learning were excluded.

  • This review focused entirely on the use of EEG data to classify tasks done by people. Any other studies were excluded, like power analyses, non-human studies, and feature selection with no end categorization.

  • Studies were included if they pertained to the classification of mental stress, while those related to depression, anxiety, emotional behavior, or suicidal tendencies were excluded.

  • Studies with subjects < 10 have been excluded from the analysis and comparison between architectures and models due to their small sample sizes. This is because their potential significance in the context of mental stress cannot be definitively established at this point.

  • Due to the fast development of this field of research, only papers published between January 2018 to November 2022 were considered in this review article.

2.3 Risk of bias assessment

The risk of bias in the included papers has been evaluated as demonstrated in Fig. 2.

Fig. 2
figure 2

Risk of Bias graph of the included studies in this review article

2.4 Data of interest

The main variable categories collected from each paper were as follows:

  1. 1.

    Experimental environment:

    • Stressor type

    • Number of participants/subjects

    • Experiment duration

  1. 2.

    EEG preprocessing techniques

  1. 3.

    Type of EEG input:

    • EEG signal features type

    • Number of electrodes/channels used

    • Electrode location

  1. 4.

    Deep learning algorithms:

    • Deep learning architecture used

    • Hyperparameters, such as the number of layers, activation functions, etc.

Figure 3 shows the number of papers that were published in the last five years which were mostly about evaluating mental stress using EEG and deep learning. It is noticed that applying deep learning for assessing mental stress is becoming of increased interest lately.

Fig. 3
figure 3

Number of publications per year on deep learning, mental stress, and EEG

3 Electroencephalogram

Electroencephalogram (EEG) is a neuroimaging technique that is used to monitor and measure the electrical activity and function of the brain over time. Usually, 32 and 64 electrodes are the most common setting for EEG [47]. In the human brain, different localizations correlate with different and specific types of activity and functions. Thus, electrode placement on the scalp is very essential. The standard method for localizing the electrodes is the international 10–20 electrode system as shown in Fig. 4. Where, the electrodes are given nomenclature by letters and numbers referring to the electrode brain lobe and hemisphere location [61], as shown in Fig. 5.

Fig. 4
figure 4

The international 10–20 and numbers [62] electrode system localization [63]

Fig. 5
figure 5

Electrode naming using letters

Usually, EEG amplitude ranges between 10 to 100 µV. A higher amplitude or a different pattern is considered to be either technical or physiological [64]. Unfortunately, this causes concern as artifacts may imitate cognitive activity or mental disorders thus affecting the diagnosis or clinical research results [65]. In order to use EEG signals for evaluating mental stress, it must go through considerable preprocessing. According to [66], non-physiological artifacts can be easily eliminated by precisely handling the experiment and applying linear filters such as high pass, band pass, and notch filters. On the other hand, physiological artifacts are difficult to handle as they usually overlap with the recorded EEG signal, thus requiring more advanced preprocessing methods like ensemble averaging, optimum filtering, and Independent Component Analysis (ICA).

Table 1 shows some of the most common filtering techniques utilized by the studies reviewed, which include mainly an ICA, a high pass filter, and a low pass or bandpass filter. Therefore, it is recommended to start with these initial filtering techniques to clean the EEG signals, and if needed, further sophisticated methods can be tested.

Table 1 Filtering techniques used by the reviewed research papers

In order to perform mental stress assessment using EEG, there are several main steps to follow, namely EEG data acquisition, pre-processing, feature extraction, feature selection, and classification [51]. EEG data acquisition and preprocessing have been discussed previously. Various alternative approaches may be employed in the feature extraction step to extract several distinct features from the same raw data. Features extracted from EEG can be classified into three main categories: time-domain (also known as temporal features), frequency-domain (also known as spectral features), and time–frequency features.

Time-domain features provide temporal information about the EEG signal. Various time-domain features have been used in quantifying mental stress, including Hjorth parameters [78], Higuchi’s fractal dimension [79], and entropies such as Shannon entropy [52], approximate entropy [80], and wavelet sum of entropy [81]. On the other hand, frequency domain features of an EEG signal provide useful information about the pattern and characteristics of the signal. It can be extracted from the filtered signal using several techniques such as discrete wavelet transform [82] or short-time Fourier transform [83], which decomposes the signal into sub-bands called EEG clinical frequency bands, delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz) and gamma (> 30 Hz). Each of these bands has different frequencies and amplitudes corresponding to specific brain states (such as awake, alert, or asleep). Several spectral features have been implemented in evaluating mental stress, such as power spectral density [84], absolute power [85], relative power [86], wavelet transform [87], and Gaussian mixtures [88]. Moreover, some papers have utilized EEG time–frequency features in assessing mental stress. This approach allows extracting information from both domains, time and frequency, simultaneously [89]. Time–frequency features can be used to produce spectrogram images or topography maps [90], which can then be used as EEG input features to the classifiers [91, 92].

4 Deep learning in mental stress

Deep learning is now one of the most famous research trends, owing to its tremendous success [85, 93]. The tangible edge of deep learning algorithms over traditional machine learning is the ability to learn and select features jointly with classifier training. Also, deep learning shines when handling a large amount of data compared to traditional machine learning, provided the hyperparameters are chosen wisely and the increased data has more information. Several deep learning architectures—including artificial neural networks, autoencoders, convolutional neural networks, Recurrent Neural Networks (RNNs), and their combinations—have been implemented in several fields. Nevertheless, when it comes to evaluating mental stress using EEG, the two most common models are Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs).

CNNs excel in extracting spatial features, which is crucial in analyzing EEG data where spatial patterns play a significant role. On the other hand, LSTMs can capture temporal information, allowing for the modeling of sequential patterns in EEG signals over time. The decision to focus on CNNs and LSTMs in this review is based on their proven effectiveness in handling the unique characteristics of EEG data related to mental stress. This strategic focus aligns with the broader trend observed among researchers, where CNNs and LSTMs have consistently proven their efficiency in mental stress assessment. By leveraging CNNs for spatial information and LSTMs for temporal dynamics, this approach aims to capture a comprehensive representation of the complex EEG patterns associated with stress states, ultimately contributing to a more enhanced understanding of mental stress assessment.

4.1 Convolutional neural networks (CNNs)

A CNN is a type of deep neural network capable of recognizing and classifying certain features in a signal or from an image. Its major benefit over its predecessors is that it introduces convolution operations in neural networks that help to learn equivariant features. These are good at automatically recognizing important features without the need for human intervention, making them the most widely utilized architecture. The simplest CNN architecture is composed of three main layer types convolutional layers, pooling layers, and fully-connected (FC) layers [94], as shown in Fig. 6. These layers are stacked together and organized in a way to perform two consecutive basic functions, namely feature extraction, and classification.

Fig. 6
figure 6

A typical architecture of a convolutional neural network [95]

In the feature extraction operation, two main layer types are responsible for the process: convolutional layers and pooling layers. In the classification phase, a Fully Connected (FC) layer with an activation function is employed on the features extracted in the previous phase to perform class prediction.

Convolutional layers: A convolutional layer is made up of several convolutional filters, also known as kernels, which are convolved with the input image [96]. Convolution of an input with a kernel can be thought of as shifting the filter from one corner (e.g., top left) to the other corner (e.g., bottom right) in steps/strides. During each shift, the dot product of the kernel coefficients with the overlapping input is computed and placed at the output. One can choose different stride values, which influence the output size and reduce the overlapping of receptive fields. In addition, the input can be padded with zeros before convolution. This keeps the output size consistent with the input [97]. An example of convolution where a 5 × 5 image is convolved using a 3 × 3 filter can be found in [98].

Pooling layers: A pooling layer performs sub-sampling to the output to produce smaller feature maps while maintaining the main dominant features. There is a wide variety of pooling methods [93], however, the most prevalent are max and global average pooling.

Fully connected layers: The output feature map is flattened and used as an input to the Fully Connected layer. Each neuron in this layer is connected to every neuron in the previous layer, thus the name. They are usually found at the end of a CNN [99].

Activation function: The activation function adds non-linearities to CNNs. There are several commonly used activation functions. Rectified Linear Unit (ReLU), hyperbolic tangent (tanh), and sigmoid functions are usually implemented on the hidden layers (after convolutional layers), while the SoftMax function is mostly utilized on the output after the FC layer [99, 100] in case of classification problems. Mostly, the motivation behind these activation functions is to add some sort of non-linearities in the network. The non-linearities increase the expressive power of the model. The soft-max function also ensures that the outputs can be regarded as probability values, by making sure that they are non-negative and sum up to one. These probability values can be regarded as a confidence measure for an input to belong to a certain class. Some of these activation functions can be found in [93, 99, 100].

4.2 Long short-term memory networks

Long Short-Term Memory Networks (LSTMs) are a special kind of Recurrent Neural Networks, which in turn are an extension of neural networks with the addition of recurrent connections in the hidden layers. The recurrent connections produce temporal memory, as shown in Fig. 7. It is worth noting that, RNNs face some limitations during the backpropagation of errors to update the weight value of the neural network. These include problems of vanishing or exploding gradients. This causes the RNN to be limited to learning only a limited temporal dependency [101].

Fig. 7
figure 7

A recurrent neural network on the left, and the unfolded architecture on the right. U, V, and W are the weights of the hidden layer, the output layer, and the hidden state, respectively. xt and Ot are the input vector and output result at time t, respectively [102]

LSTMs introduce gates to solve these issues. LSTMs have memory cell state blocks through which signals flow and are guided by input, forget, and output gates. These gates regulate what is to be saved, read, and written on the cell [103]:

a. Forget gate

The forget gate is the first step, which involves determining what information should be wiped out from the cell state. Mathematically it can be expressed as follows:

$$f_{t} = \sigma \left( {{\text{W}}_{f} x_{t} + {\text{W}}_{f} h_{t - 1} + b_{f} } \right)$$

where, ft represents the output of the forget gate and time instant t, ht, and xt is the hidden state and input vector respectively. In addition, b is the bias of each layer, σ represents the sigmoid function and W’s are the learning weight parameters.

b. Input and update gates

This layer decides what values are going to be stored or added to the cell state. Mathematically this can be expressed as follows:

$$i_{t} = \sigma \left( {{\text{W}}_{i} .[h_{t - 1,} x_{t} } \right] + b_{i} )$$
$$\widetilde{{\text{C}}}_{t} = tanh\left( {{\text{W}}_{c*} [h_{t - 1,} x_{t} } \right] + b_{c} )$$

where it represents the output of the input gate and \(\widetilde{C}\) is the candidate hidden state that is based on current input multiplied by the previous hidden state.

To get the cell state \(C\) updated, the last two layers can be combined:

$${\text{C}}_{t} = f_{t} {\text{*C}}_{t} + i_{t} {*}\widetilde{{\text{C}}}_{t}$$

where, \(C\) is the internal memory of the unit, which is based on previous memory multiplied by forget gate ft added to the product of the newly computed hidden state \(\widetilde{C}\) and input gate it.

c. Output gate

The sigmoid layer first decides which part of the cell state it is going to keep or output. This can be expressed as follows:

$$o_{t} = \sigma \left( {{\text{W}}_{o} .[h_{t - 1,} x_{t} } \right] + b_{o} )$$
$$h_{t} = o*{\text{tanh}}({\text{C}}_{t} )$$

A bidirectional LSTM, noted as BiLSTM, implies that the signal propagation is in both directions, forward as well as backward [104]. A typical internal structure of the LSTM model can be found in [105], while each gate has been extensively explained in many studies [103, 104, 106,107,108].

5 Architecture design choices

This portion of the review focuses on identifying patterns in the development of various deep-learning architectures. In general, the factors to be kept in mind while designing are the architectural type, and hyperparameters, such as the number of hidden layers, the kind of activation functions, and the type of end classifiers. In this review, we focused on the type of architecture. The most predominant observed architecture types from the reviewed papers were CNNs, RNNs (LSTMs), as well as some hybrid architectures. The architecture design choices of each paper reviewed in this chapter and their results are summarized in Table 2.

Table 2 Summary of all studies reviewed in this paper

5.1 Convolutional neural networks

CNNs are the most popular architectural design framework applied to detect mental stress [109]. It involves alternating convolutional and pooling layers. The type and number of layers, the structure of layers as well as the type of final classifier are the most important design features for CNNs. Several papers proposed to classify mental stress using raw EEG data and have achieved competitive outcomes as reviewed below.

5.1.1 Raw EEG with CNN

Jebelli et al. [67] proposed the use of deep CNNs, to predict the level of stress of 10 construction workers applied to several stressors at actual construction sites. The deep CNN architecture used had four blocks of convolution and pooling layers, while for the classification, one final dense layer, fully connected, was used with two softmax units. Figure 8 shows the architecture developed. The CNN detected two levels of stress (low and high stress) using raw EEG signals with an accuracy of 64.20%. The results were compared with state-of-the-art methods that used fully connected deep neural networks, with 2 hidden layers having 83 and 23 neurons respectively, 1 input layer and 1 output layer. The accuracy achieved was 86.62%. The paper claimed that increasing the number of hidden layers of the architecture will not necessarily improve the accuracy of detecting stress.

Fig. 8
figure 8

Deep convolutional neural network architecture to recognize construction workers’ stress levels based on their EEG signals [67]

In another study on detecting the mental state, Zeng et al. [69] constructed two novel classifiers, namely EEG-Conv and EEG-Conv-R, to differentiate between two mental states which are vigilance and bored. EEG-Conv is based on traditional deep CNNs, composed of eight layers: the input layer, three convolutional layers, a pooling layer, a local response normalization layer, a fully connected layer composed of 2048 neurons, and where a dropout strategy was applied to prevent overfitting, and an output layer. Each neuron in the CNN had a ReLU activation function, which in this paper was stated to be more efficient than the sigmoid and tanh functions. On the output layer, logistic regression was used as a linear, probabilistic classifier. The architecture of EEG-Conv is illustrated in Fig. 9.

Fig. 9
figure 9

Proposed traditional EEG-Conv classifier architecture to differentiate between two mental states which are vigilance and bored using LR function for classification [69]

To enhance classification accuracy, this study integrated Convolutional Neural Networks (CNNs) with state-of-the-art deep residual learning techniques, resulting in the creation of a novel classifier known as EEG-Conv-R. The EEG-Conv-R architecture is established by the incorporation of two residual blocks into the existing EEG-Conv structure, as visually represented in Fig. 10. In this configuration, each layer within the network employing residual blocks propagates its output not only to the immediate subsequent layer but also directly to layers positioned two to three steps further ahead. These introduced "skip" or "shortcut" connections serve as a strategic solution to combat the vanishing gradient issue, thus enabling the effective training of significantly deeper networks.

Fig. 10
figure 10

Proposed EEG-Conv-R classifier with residual learning [69]

Raw EEG data extracted from people subjected to driving simulation were fed to the EEG-Conv and EEG-Conv-R architectures directly, achieving an accuracy of 82.95 and 84.38% respectively. These results outperformed the traditional LSTM- and SVM-based classifiers.

Penchina et al. [70], investigated the use of EEGNet (CNN) to discriminate between anxious and non-anxious states in neurotypical people and people on the autism spectrum. The subjects were given arithmetic tasks to induce stress, while relaxation was achieved by guided and unguided breathing periods. The EEGNet architecture developed consisted of eight 2D convolution filters, one depth-wise convolution layer, one separable convolution layer, and a final dense layer of four neurons with a SoftMax activation function for classification purpose, as shown in Fig. 11. EEGNet was able to detect stress from raw EEG signals with an accuracy of 60.21%. As stated by the paper, limited accuracy can be due to several reasons, such as that the tasks were simplified to not overstimulate people with autism, and the period of guided/unguided breathing was longer than the stress period causing an unbalanced dataset.

Fig. 11
figure 11

Proposed EEGNet CNN classifier architecture to discriminate between anxious and non-anxious states in neurotypical people [70]

Sundaresan et al. [72] conducted further studies to compare the efficiency of previously proposed EEGNet with deep ConvNet and shallow ConvNet architectures, where deep ConvNet consisted of 4 convolution-max-pooling blocks, with the first block containing 25 2D temporal convolutional filters, 25 2D spatial convolutional filters, and a max pooling layer, while the subsequent 3 blocks contained 2D convolutional layer and a max pooling layer with 50, 100 and 200 filters per block. All the neurons except in the final layer utilized Exponential Linear Unit (ELU) as the activation function, while a SoftMax activation function was used on the final dense layer for classification purposes.

On the other hand, shallow ConvNet is a modified version of deep ConvNet, which consists of a single convolution-pooling block, a squaring non-linearity function, an average pooling layer, and a logarithmic activation function. The diagram of model architectures used is shown in Fig. 12. The EEGNet, deep ConvNet, and shallow ConvNet architectures were able to detect stress by 61.18, 58.80, and 62.84% respectively.

Fig. 12
figure 12

Proposed CNN classifier of 4 convolution layers vs. CNN classifier of single convolution layer [72]

In a more recent study [110], Fu et al. proposed a novel deep learning model that differentiates between 4 mental states (relaxed, medium stress, high stress, and stress recovery). The model named Symmetric Deep Convolutional Adversarial Network (SDCAN) works by merging CNNs and adversarial theory. This helps in automatically extracting invariant and discriminative features from raw EEG to enhance classification and subject generalization. The model is composed of two symmetrical CNNs that serve as the generator and discriminator as shown in Fig. 13. The discriminator consisted of 4 convolution max pooling blocks attached to 5 deconvolution layers. The subjects were placed under stressor TSST (Trier Social Stress Test), while raw EEG was collected. In this study, when compared, the SDCAN model achieved an accuracy of 87.62%, outperforming traditional CNN. In addition, Abhishek and Nallavan proposed mental stress assessment in sports [111].

Fig. 13
figure 13

Proposed SDCAN model that merges CNN and adversarial theory [110]

5.1.2 Spectral images or topological maps with CNN

Apart from the aforementioned CNN architectures that used raw EEG signals as input, there are a few papers that proposed the use of spectral images or topological maps and achieved competitive outcomes, as reviewed below.

The EEG data can be represented as 2D or 3D pictures in the topological map input formulation, depending on the spatial topology of the electrodes, i.e., the position of electrodes on the scalp [112]. Martínez-Rodrigo et al. [73] proposed a study that focused on using a CNN model that utilized EEG signals interpreted as topological maps of the scalp. These maps were fed as images to the CNN model. The power for every EEG channel was individually obtained for each frequency sub-band (Alpha, Beta, Gamma, and Theta) and the whole band (4 Hz to 45 Hz). The resulting powers were then normalized.

The obtained powers Pt, Pθ, Pα, Pβ, and Pγ were initially transformed into 2D images using three different mapping approaches, namely Direct Matrix Distribution (DMD), Direct Matrix Distribution interpolated (DMDi), and Azimuthal Equidistant Projection (AEP). It should be stated that the three mapping approaches utilized jet colormap with 256 colors, ranging from dark red (maximum value) to dark blue (minimum value) to represent the spectral power values. The 2D maps obtained from the power parameters were then piled to form 3D images, from each of the three mapping approaches separately. The authors in this paper proposed the use of an AlexNet-Based CNN model to differentiate two mental states, which are distress and calm. The 2D AlexNet-Based CNN model proposed in this study consisted of five convolution layers, three max-pooling layers, and three FC layers (containing two drop-out layers to prevent overfitting and improve generalization errors) with 4096 neurons in each layer. It is to be noted that a ReLU activation function was used after every convolutional and FC layer. The 2D AlexNet-based CNN model proposed in this study is shown in Fig. 14. For the CNN model to deal with 3D images as input data, the same CNN architecture was used, but the size of the convolution and max-pooling layers was extended. The authors found the combination of DMD with 3D CNN to be slightly better than other combinations, with an accuracy of 86.12%.

Fig. 14
figure 14

The proposed 2-D AlexNet CNN model architecture with 5 convolution layers to differentiate between distress and calm [73]

Kamińska et al. [77], applied Morlet wavelet transform to the averaged values from all electrodes to generate time–frequency representation images, where the frequency range taken into consideration was from 1 to 28 Hz. The time–frequency representation images were used as input for a CNN-based deep learning model, achieving an accuracy of 87.5% in detecting stress and relaxed states. Kamińska et al. built a CNN architecture, shown in Fig. 15, with three convolution layers having 250, 250, and 100 neurons respectively, and a final dense FC layer with 100 neurons. It was stated that only after the first relaxation period, it was possible to record the stressful state, which may have been due to participants still not feeling comfortable with the situation or stressed by the new experience.

Fig. 15
figure 15

Structure of the CNN classifier using a heat-map image to detect stress [73]

A more recent study by Mane et al. [113], investigated the use of 2D Azimuthal projection to develop 2D images as an input to a CNN model. Initially, raw EEG signals were chopped into frames and a Hanning window was applied. Meanwhile, Fast Fourier Transform (FFT) was used to extract the EEG frequency domain. Later, frequency binning was performed to group the signal into the 3 frequency ranges. Finally, RGB channel values were used to represent alpha, beta, and theta values and form images. The authors in this paper proposed using a CNN model that consists of 10 2D convolution layers, then a max pooling layer that feeds to a final flattened dense layer for classification between stressed and normal states. This study achieved an accuracy of 93.0%.

5.2 Long short-term memory networks

As with the CNN architecture, designing an LSTM architecture mainly focuses on the type and number of layers as well as the type of final classifier. LSTM models are considered the second most prevalent deep neural network models, after CNNs, in classifying mental stress with EEG signals.

Penchina et al. [70] proposed the use of an LSTM network with two layers to detect anxiety in neurotypical people and people with autism. According to this study, several papers have suggested the use of LSTMs with two layers, stating that it improves the accuracy of classification compared to using only one or more than two. They achieved an accuracy of 93.27%. The proposed LSTM design, as shown in Fig. 16, consisted of two LSTM layers, with 50 neurons in the first layer and 40 neurons in the second layer, two drop-out layers of rate 0.5, two hidden dense layers with 20 neurons that use a sigmoid activation function in the first layer, while the second layer had 10 neurons and uses ReLU, followed by one output dense layer with a SoftMax function for classification. The number of neurons in both LSTM layers was implemented according to the literature study [114].

Fig. 16
figure 16

The proposed two-layered LSTM model to detect anxiety in neurotypical people and people with autism [73]

A more recent study [72], was performed after [70] to further compare the results of the aforementioned two-layered LSTM model performance with more CNN architectures such as EEGNet (CNN), Deep ConvNet, and shallow ConvNet, that achieved accuracies of 61.18, 58.80, and 62.84% respectively. The LSTM model was able to outperform the CNN performance in evaluating mental states using raw EEG signals in this paper. The authors also asserted that the performance of the LSTM network remained consistent regardless of the presence of mental disorders and produced similar results. Therefore, it proves to be effective in detecting stress in both neurotypical individuals and those with autism. In a more recent study [115], Phutela et al. analyzed two levels of mental stress using a 2-layer LSTM model The study achieved an approximate accuracy of 99.71%. Stress was induced by watching emotional video clips. Meanwhile, raw EEG signals were collected from the subjects and then fed to the LSTM model. The authors implemented an LSTM model that contains an initial 8-neuron LSTM layer (LSTM 1), followed by a second 16-neuron LSTM layer (LSTM 2). Then a dropout layer was added to prevent overfitting and noise learning. Finally, for classification purpose, a FC-dense layer with a Sigmoid activation function was used.

5.3 Hybrid architectures

Hybrid deep learning models combine two or more deep learning models into a single architecture [112]. Researchers have attempted to integrate several deep learning networks, including the standalone deep learning models discussed above (CNNs and LSTMs), obtaining promising findings for identifying mental stress.

For instance, Kuanar et al. [71] implemented the concept of hybrid architecture where a CNN (ConvNet) model and an LSTM model were integrated to preserve the spectral, spatial, and temporal structure of EEG data. The authors also focused on extracting spatial-frequency images, or spectral images, from EEG signals to be fed to the hybrid system. Fast Fourier Transform was applied to estimate the EEG’s power spectrum, and only three sub-bands, theta (4–8 Hz), alpha (8–13 Hz), and beta (13–30 Hz) were chosen, as several literature studies recommended. Later, the sum of the squared absolute power values for each sub-band associated with each electrode was calculated and transformed into 2D images with corresponding color channels to represent the spectral dimensions. The resulting topographical map is fed as an input to the ConvNet (CNN) model that extracts feature vectors which are fed to the LSTM model, as shown in Fig. 17.

The proposed hybrid model in Fig. 17 consisted of a CNN model with nine convolutional layers, three max-pooling layers, and one fully connected layer. All the layers utilized the ReLU activation function. For the LSTM architecture, three models have been suggested: a two-layered LSTM network, a two-layered LSTM network with a 1D convolutional layer, and a bidirectional LSTM network. Note that Bidirectional LSTMs can process EEG data in both directions, using two separate hidden layers. For the final layer of each model, a fully connected layer was used with a SoftMax function to perform classification. When comparing the performances of the proposed models, ConvNet + LSTM, ConvNet + LSTM + 1D Conv, and ConvNet + Bi-LSTM achieved accuracies of 84.48, 87.68, and 92.5% respectively in detecting four mental states. The proposed model using ConvNet + Bi-LSTM is shown in Fig. 18.

Fig. 17
figure 17

The proposed model utilizes 2D topographical images as input to the hybrid ConvNet BiLSTM model [71]

Fig. 18
figure 18

Hybrid LSTM model used in the study: ConvNet + Bi-LSTM [71]

Chakladar et al. [75], analyzed three levels of mental stress using a hybrid of bidirectional LSTM and LSTM networks. The authors also extracted a hybrid of frequency, statistical, and non-linear features. Furthermore, the Grey Wolf Optimization technique was implemented in order to choose the best features. These features were fed to the hybrid deep learning model shown in Fig. 19. It was proposed to have a single bidirectional LSTM layer and two LSTM layers. Each layer was followed by a dropout layer of rate 0.2 and a batch normalization layer, to prevent overfitting and normalize the output. Finally, for classification purposes, two consecutive dense layers have been used. The classification was done using the STEW dataset, yielding an accuracy of 82.57%.

Fig. 19
figure 19

The proposed bidirectional LSTM-LSTM classifier model for detecting three levels of mental stress [75]

In another study [72], Sundaresan et al. proposed the use of hybrid LSTM models with fully convolutional networks to detect anxiety in neurotypical people and people with autism. The raw EEG signal was fed as input simultaneously to two blocks—an LSTM block and a fully convolutional network—as shown in Fig. 20. The LSTM block was composed of a single LSTM layer followed by a dropout layer with a rate of 0.8, while the fully convolutional network block had three 1D convolutional layers of different sizes, followed by a pooling layer. Finally, the output from each block is concatenated and fed into a 4-neurons dense output layer using the SoftMax activation for classification. The proposed model achieved a relatively low accuracy of 62.97%, which can be explained by the simplified stressor task meant to not overstimulate the subjects with autism. However, it's noteworthy that the results were not affected by the presence of mental disorders, producing similar outcomes. Moreover, the dataset collected was unbalanced due to longer breathing periods compared to the stressor periods. Finally, LSTM-based deep learning models are usually over-reliant upon large datasets.

Fig. 20
figure 20

Diagram of the proposed LSTM-fully convolutional network architecture, where the output of the convolution block and the LSTM block are concatenated for classification [72]

In a more recent study [116], Malviya et al. investigated the use of hybrid CNN- Bidirectional LSTM for detecting mental stress. The subjects performed mental arithmetic tasks to induce stress. Further, Discrete Wavelet Transform (DWT) was used to filter the raw EEG signals and divide them into 5 frequency bands. The proposed hybrid architecture, as shown in Fig. 21, consisted of a CNN model with 2 convolution layers, 2 max-pooling layers, and 1 FC layer, while the BiLSTM model is composed of 3 LSTM layers with 2 cells in each layer and 64 neurons. The output is fed into a dropout layer then a dense layer with a SoftMax unit for classification. The hybrid system was able to discriminate stress from relaxation state with an accuracy of 99.2%.

Fig. 21
figure 21

Diagram of the proposed hybrid CNN-BLSTM used to classify stress from DWT-filtered EEG [116]

Meanwhile, Xia et al. [117], investigated the use of a multi-branch LSTM merged with a hierarchical Temporal Attention (MuLHiTA) model. Raw EEG signals collected from subjects performing mental arithmetic tasks were fed to the proposed model. Meanwhile, the model consisted of two complementary branches, an Intra-LTAM layer (an Intra-BLSTM and a temporal attention mechanism) and then an Inter-LTAM layer (an Inter-BLSTM and a temporal attention mechanism), where each layer contains an attention module and a fully connected layer. Note that, an intra and inter in the LTAM networks refer to Interslice and intraslice feature extractions. Following that, a concatenate layer is used to aggregate the outputs from the two networks. Finally, for classification purposes, an FC layer with SoftMax was used. The classification between two mental states (rest and stress) was performed using the DMAT dataset, yielding an accuracy of 99.71%.

A summary of all the studies that have been reviewed in this paper can be found in Table 2.

6 Discussion

In this section, we engage in a comprehensive analysis of the statistical findings derived from the data extracted from the reviewed papers. We aim to provide a clear and insightful interpretation of these statistics, shedding light on key trends and patterns observed in the field of mental stress classification using EEG signals.

Furthermore, we extend our focus beyond the immediate statistical insights. We offer valuable recommendations for designing robust deep-learning architectures and optimizing essential parameters.

However, our commitment to advancing the field does not cease in the present. Recognizing that research is continually evolving and expanding, we present forward-looking recommendations. These suggestions outline the potential future directions that could take the study of stress to new levels of understanding. Our aim is to promote a more advanced, comprehensive, and multidimensional approach to stress research, deepening our understanding of this area of study.

6.1 Deep learning architecture

6.1.1 Input formulation

Choosing the type of EEG input formulation depended to a great extent on the type of deep learning architecture utilized to detect mental stress as shown in Fig. 22.

Fig. 22
figure 22

The number of papers that used different input

As for CNN-based deep learning architectures, two common types of input formulations were used by the studies reviewed in this paper to classify mental stress. These types are raw EEG signals and spectral/topological maps. Raw EEG signals and spectral/topological maps were proposed equally in the reviewed papers. Although raw EEG and spectral/topological maps were recommended equally, the average accuracy achieved by CNNs with spectral/topological maps across all review papers outperformed those achieved by CNNs using raw EEG signals, with 84.73% and 67.79% respectively, as shown in Fig. 23.

Fig. 23
figure 23

Accuracy of using raw EEG signals or spectral/topological maps as input to a CNN model

This result can be linked to CNNs’ ability to extract higher representations or features from image content. Besides, it can also be noticed that using raw EEG signals as input to the CNN model did not yield consistent accuracy across all reviewed papers. There was a large variability in the results, ranging from 60 to 84%, raising questions of whether there are other factors that have influenced the accuracy of the classification other than the raw EEG input.

On the other hand, LSTM-based deep learning architecture models used only raw EEG signals as input formulation, achieving an accuracy ranging from 93 to 95%, which shows that the accuracy of using the LSTM model to classify mental stress from raw EEG signals is considered stable at around 94%. However, further investigation is required.

Meanwhile, hybrid-based deep learning architectures investigated the use of three different types of input formulations. Spectral/topological maps were used in 60% of the hybrid-based studies, followed by raw EEG signals and hybrid input signals in 30% (each). The usage of different types of input formulation accounts for the ability of these hybrid architectures to process a variety of data inputs depending on the types of deep learning architectures. For instance, in [71], the CNN and LSTM models were integrated, thus allowing for the use of spectral topography as an input. On the other hand, Chakladar et al. [75] utilized a hybrid of statistical and spectral features as input to the merged bidirectional LSTM and LSTM network. Figure 24 shows the average accuracy for different types of input formulations with hybrid deep learning architectures.

Fig. 24
figure 24

Accuracy of different types of input fed into hybrid deep learning architectures

As shown before, the type of input formulation with regard to specific deep learning models plays an important role in the classification result. For instance, the utilization of the same deep learning technique, for example, CNN, resulted in a variety of different accuracies when utilized by different studies using different types of input signal formulation.

6.1.2 Deep learning architecture and activation function

Based on the reviewed studies, the most prevalent architecture design framework adopted was the CNN, found in 67% of the papers as shown in Fig. 25. The reason behind the widespread use of CNNs can be due to several factors. First, a distinctive property of CNNs is that it doesn’t require prior feature selection, which means that they can extract deep distinct features and spatial patterns from raw EEG signals. Thus, it can perform feature extraction as well as classification. Furthermore, CNN architectures can classify and process various forms of EEG signals, including raw signals as well as 2D images such as spectral and topological maps.

Fig. 25
figure 25

Most used deep learning models in classifying mental stress

Meanwhile, of the reviewed papers, none have specifically mentioned or compared the use of different numbers of convolution layers in CNN models. However, there was a common trend in literature, as 56% utilized either three convolution layers, followed by four layers, 22% Meanwhile, the rest of the studies used five or ten convolution layers equally, 11% as shown in Fig. 26. Given the large number of studies that used three convolution layers, it is recommended to employ three convolution layers in the first design of a CNN model. The potential for performance improvement can be investigated using four layers followed by other numbers of layers that were used less often.

Fig. 26
figure 26

Number of convolution layers utilized in CNN models

On the other hand, the most used activation function in convolutional layers is ReLU with 64.7%, followed by Exponential Linear Unit (ELU) with 23.5%. Meanwhile, one study explicitly investigated the effect of using squaring nonlinearity and logarithm as activation functions [72]. No other studies have investigated their use yet. However, the accuracy achieved in [72] was relatively low, 62.84%.

For the classification layer, half of the reviewed studies have proposed one fully connected layer in the CNN models, while the other half proposed three. Therefore, it is recommended to test both while designing a CNN model for detecting mental stress, to determine which one works best. For classification purposes, all the studies utilized the SoftMax function except [69], where Zeng et al. proposed the use of logistic regression as a probabilistic classifier, achieving a classification accuracy of 84.38%.

The LSTM-based deep learning model was proposed in only 12% of the papers, which is far below expectations. LSTMs had proven effectiveness and outperformed other deep learning models in previous research reviewed. For instance, [70] compared the performance of LSTM and CNN models of different architectures, and found that LSTMs significantly outperformed CNNs in classifying stress. Furthermore, LSTM-based deep learning models give outstanding performances when raw EEG signals are used as input formulation, thus providing end-to-end learning as well as reducing processing time by eliminating the need for feature extraction.

Most of the reviewed studies that specified the number of LSTM layers proposed the use of two LSTM layers, following the suggestions of several previous studies. Sundaresan et al. [72] investigated and compared the performance of three LSTM models with different numbers of LSTM layers. It was shown that the accuracy improved tremendously, increasing by about 20% when using two layers instead of one. However, using three LSTM layers instead of two reduced the accuracy by 18%, showing that using two LSTM layers yields the best performance.

Finally, hybrid architectures were proposed 20% of the time to detect mental stress. The reviewed studies did not specifically compare the performance of hybrid models with standalone deep learning models, but hybrid architectures proved to achieve high performances in several studies, except in [72] where an LSTM model integrated with a fully convolutional network block yielded an accuracy of 62.97% in detecting mental stress. The resulting low accuracy is thought to be due to the usage of a single LSTM layer in the model.

6.2 Limitations

This review paper deliberately confines its scope to the examination of Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and their hybrid architectures, exclusively omitting other deep learning models. The emphasis is primarily directed toward exploring the impact of various input formulations on the performance and accuracy of these specific models. Consequently, unconventional or less common models within the deep learning landscape are not explicitly addressed in this review. While this focused approach provides an in-depth analysis of the specified models and their input variations, it inherently limits the broader applicability of the findings to a specific subset of deep learning architectures. Furthermore, it is noteworthy that this review spans the timeframe from January 2017 to November 2022, reflecting the research landscape during this period. The limitations imposed by exclusions and temporal constraints may influence the overall comprehensiveness of the review. Readers are advised to interpret the findings considering these specified boundaries.

6.3 Future recommendations

This paper has helped in understanding the relationship between different deep learning models and EEG input formulations for classifying mental stress. As we move forward, several areas can be considered for further research and exploration. A visual representation of recommended paths is presented in Flowchart Fig. 27.

Fig. 27
figure 27

Flowchart of recommendations for future work

Literature Gaps and Recommendations.

1. The review findings indicate that CNNs tend to perform best with EEGs presented in the form of spectral or topographical images, while LSTM models are typically applied to raw EEG data. However, previous studies have predominantly used spectrograms or connectivity maps as input images for CNNs. To advance the field, we propose exploring the utilization of 3D spatiotemporal images as an input. By converting EEG connectivity features into images, we anticipate an enhancement in understanding spatial and temporal dynamics, potentially leading to improved classification results.

2. Although hybrid designs have been recommended in only a few studies, these instances have demonstrated high performance. Surprisingly, there is a lack of comparative research on the performance of hybrid models versus standalone models. We recommend conducting comprehensive investigations to assess the effectiveness of hybrid architectures in EEG classification tasks as this is an underexplored area with potential benefits.

3. We also recommend exploring novel practices such as attention-based deep learning models and the application of graphical neural networks like Graph Convolutional Networks (GCN). The GCN approach holds promise for advancing our understanding of brain topography and connectivity while improving classification accuracy, particularly in scenarios with irregular or non-Euclidean data like EEG signals. Investigating attention-based deep learning models as a means to incorporate additional information into the classification process can also help. These models can identify crucial elements within the data for improved classification results. While initial investigations have begun in a few papers [118,119,120], there is room for further exploration of these novel approaches in various EEG classification scenarios.

4. Exploring innovative approaches such as LSTM-ALO and CNN-INFO holds significant promise. The proposed CNN-INFO, an optimization algorithm relying on the weighted mean of vectors in conjunction with CNNs, offers a promising avenue for stress detection. By integrating CNNs' capabilities in processing visual data with a novel optimization approach based on weighted vector aggregation, CNN-INFO aims to capture comprehensive representations of stress-related patterns while maintaining adaptability to varying input conditions. This approach facilitates enhanced feature aggregation, robustness to data variability, and improved interpretability, making it well-suited for real-world stress detection applications. As future research progresses, exploring CNN-INFO's performance across different datasets and its potential extensions could further solidify its role in advancing stress detection methodologies [121]. Similarly, the LSTM-ALO algorithm, it is a hybrid model combining the Long Short-Term Memory (LSTM) neural network with the Ant Lion Optimizer (ALO), has showcased its efficacy in optimizing learning rates and facilitating rapid convergence in various optimization tasks [122].

7 Conclusion

In conclusion, this paper has conducted an extensive review of deep learning models applied to classify mental stress using EEG signals. We have thoroughly examined the variations in deep network design based on the type of input formulation and deep learning techniques. It is evident that the choice of input formulation significantly impacts classification outcomes when paired with specific deep learning models; As a result, this paper has primarily focused on understanding the relation between the EEG input formulation and deep learning models. Our findings have highlighted the effectiveness of CNNs with spectral and topographical images, while LSTM models have demonstrated their capability with raw EEG data.

In addition to the type of input formulation, details about deep learning architectures such as the most common number of layers utilized for each type of deep learning model and the type of activation function were also explained and explored. These recommendations are based on a comprehensive analysis of deep learning design choices derived from the diverse studies reviewed. Based on these observations, we aim to offer practical guidance for researchers undertaking similar studies, in selecting the most suitable model, adjusting the model’s parameters as per the nature of their data nature, and thus achieving better results.

Additionally, we have highlighted the potential of hybrid architectures and emphasized the need for more in-depth research in this area as it is an area that is not widely covered, and its performance compared to standalone models was not investigated. Additionally, we proposed the application of Graphical Neural Networks (GCNs) for improving classification results, as these can help due to the non-Euclidean data structure of EEG signals. As the field continues to evolve, it is crucial to explore novel approaches and combinations to enhance accuracy, interpretability, and our overall understanding of EEG classification. By exploring GCNs, researchers can unlock new dimensions in mental stress classification, leading to more effective diagnostic and therapeutic applications. We anticipate that this paper will serve as a solid foundation for future research endeavors in the realm of deep learning and EEG-based mental stress signal classification. These efforts will undoubtedly contribute to advancements in mental health assessment and treatment, ultimately benefiting individuals dealing with stress-related conditions.