1 Introduction

People nowadays spend an increasing amount of time on social media, online shopping, online video games, and other online activities as a result of the accessibility of numerous electronic items in their daily life. Affects, which include feelings and moods, are present in almost every aspect of our day-to-day lives and play a crucial role in human cognition, communication, and decision-making. They also play a significant role in the interactions between humans and machines (Lerner et al. 2015). However, most of the modern human interaction (HCI) systems lack emotional intelligence since they are unable to understand or analyze emotional data. They are incapable of recognizing human emotions and utilize this understanding to decide what to do.

For highly intelligent HCI to succeed, the absence of chemistry between humans and robots must be fixed. Human affective states must be considered for any HCI system to function successfully. We need to provide systems the ability to decipher and comprehend human emotional states in order to solve this issue in HCI. Therefore, a trustworthy, accurate, flexible, and dependable emotion identification system is a must for adopting intelligent HCI. Researchers working in the area of artificial intelligence (AI) are looking into emotion recognition and affective computing in general with the eventual objective of equipping machines with feelings. Among HCI technologies, BCI is crucial for recognizing emotions. Emotions are conveyed and understood by humans through nonverbal cues such as facial expressions (Ekman et al. 1987), speech (Ayadi et al. 2011), gesture (Kipp and Martin 2009), texts (Alm et al. 2005), physiological signals (Picard et al. 2001) and their combinations (Kessous et al. 2010). Physiological signals are harder to manipulate and may more accurately reflect an individual's true feelings than other modes of communication. Emotion identification via BCI is promising because it could improve human cognition, communication, decision-making, and health by monitoring and regulating the brain's emotional state. It also directly measures the condition of the brain, where emotions originate (Wu et al. 2023). The operation of BCI is elaborated upon further down.

1.1 Brain computer interface

Brain Computer Interface is an effective interface technology of HCI Field that connects the human brain and the outside world by decoding thoughts and eliminating the need for conventional information delivery methods (Kaur and Singh 2003). It allows direct signal transmission from the neurons to an external device or system (Mudgal et al. 2020). BCI algorithms look for patterns in brainwaves and execute actions based on what they find. This method enables people to engage with their surroundings without using their peripheral nerves or muscles (Stegman et al. 2020). BCI uses a variety of neuroimaging techniques for clinical practice and research labs. These techniques are categorized to measure different brain activities including electrical, magnetic, and metabolic, as shown in Fig. 1.

Fig. 1
figure 1

Different neuroimaging techniques used in BCI

Brain electrical activity can be measured via EEG, ECog (Electrocorticogram) and Intracortical neuron reading (INR) and magnetic activity can be measured via Magnetoencephalogram (MEG), whereas brain metabolic activity can be measured using functional magnetic resonance imaging (fMRI), near-infrared spectroscopy (NIRS), Positron Emission Tomography (PET), Diffuse Optical Imaging (DOI), Computed Axial Tomography (CAT) and Event related optical signal (EROS) techniques. When it comes to these brain functions, electrical measuring techniques, despite having a high level of noise, provide a high temporal resolution, whereas metabolic signals, while providing a high spatial resolution, demand a lot of resources (such as large, expensive scanners, machines, etc.), have a low temporal resolution, and require a high level of computational complexity. Therefore, among these approaches, EEG-based BCIs are most widely used as it’s portable, non-invasive, reasonable, and having high temporal resolution.

A BCI based system consists of four components, these being, signal acquisition, pre-processing, translation, and feedback or output as shown in Fig. 2. The signal acquisition step uses a variety of invasive, semi-invasive and non-invasive approaches to acquire brain signals. Invasive and semi-invasive procedures record brain signals by inserting devices directly into the human brain or into the skull, while non-invasive techniques record signals by placing devices on the scalp of the brain.

Fig. 2
figure 2

Components of BCI system

After acquisition, signals are sent to pre-processing stage, where noise reduction, artefacts correction/removal tasks are performed to enhance the raw signal. The translation phase detects discriminative information in the signal, after which different features are retrieved and mapped onto a vector. Because of overlapping and distortion concerns with the signal, extracting this important information is a difficult operation. The size of the feature data is decreased to allow it to be given to the translation algorithm, which reduces complexity without sacrificing significant information. The selection of good discriminative features is necessary to achieve efficient pattern recognition, in order to interpret user’s intentions (Nicolas-Alonso and Gomez-Gil 2012). The instructions provided by the translation algorithm, guide and operate the output device. It helps users reach their goals, such as controlling a mouse, selecting alphabets, moving a robotic arm, operating a wheelchair, moving a paralysed limb with a neuroprosthesis, and so on.

BCIs have contributed to a wide range of sectors, including education, psychology, medicine (Shih et al. 2012) sand many others as shown in Fig. 3. They were generally used to aid those who are paralyzed or partially paralyzed but recently BCIs are also being utilized by able-bodied people as well. BCI technology is being developed for both, medical and non-medical applications. Recent advancements in BCI technology are aimed at developing emerging techniques in a wide range of fields.

Fig. 3
figure 3

Applications of BCI

People with medical conditions often benefit from healthcare applications. Spelling applications (Elsawy et al. 2017), virtual keyboards (Salih and Abdal 2020), prosthetic equipment (Velliste et al. 2008), and smart wheelchair (McFarland and Wolpaw 2011) can enable paralysed patients to control and communicate with their surroundings. BCI also has a restoration application for people with motor difficulties in healthcare (Tan et al. 2010). In the medical world, BCI plays a critical role in detection and diagnosis. BCI can detect many diseases more precisely than other detection techniques, including brain tumours (Sharanreddy and Kulkarni et al. 2013a, b), seizures (Sharanreddy and Kulkarni et al. 2013a, b), and sleep disturbances (Hansen et al. 2013). The BCI neuroimaging technology can be used to diagnose diseases such as dyslexia (Fadzal et al. 2011), ADHD (Lim et al. 2019), and the human gait cycle (Shafiul Hasan et al. 2020) and many others. When it comes to preventative measures, the BCI neurofeedback training technique can help maintain attention and reduce motion sickness while driving, which can help prevent many accidents. In addition to that, drowsiness detection using BCI (Zhu et al. 2021) also helps to avoid accidents in several cases. Furthermore, BCI can be used to monitor stress (Perur et al. 2022), fatigue (Monteiro et al. 2019), sleep-stage (Chen et al. 2018a, b), work-load (Roy et al. 2013), and many other factors in order to maintain a healthy environment in the surrounding as well to prevent numerous mishaps (Mudgal et al. 2020). Nonmedical BCI applications falls under the domain of consumer products which include gaming (Marshall et al. 2013), robotics (Hochberg et al. 2012), safety, and security (Su et al. 2012), neuromarketing (Zgallai et al. 2019), smart home applications (Lin et al. 2014) etc. Self-control (Liu et al. 2020) and emotion recognition (Huang et al. 2021) is also among the most significant successes in BCI (Kawala-Sterniuk et al. 2021).

1.2 Emotion recognition

Emotion is a state that comprises an individual’s feelings, thoughts, and actions, which is also known as people’s psychophysiological reactions to internal or external influences. These emotions are essential for communication, perception, and decision-making processes. Therefore, emotion recognition is considered an essential machine capability in human–machine communication area (Egger et al. 2019). If the computer can accurately identify the emotional states of operator in real time, the interaction between human and machine will be more enhanced and effective by making the system more intelligent and user-friendly. This domain of research is called affective computing, which is a field of artificial intelligence that concentrates on HCI through user affect detection. One of the main objectives of the affective Computing domain is to develop methods for devices to comprehend human emotion, which may enhance their capacity for communication. Nowadays, emotion recognition studies have concentrated on the following areas: (1) the relationship between various physiological signals and emotions; (2) methods for choosing stimuli that will elicit the anticipated emotional states; (3) emotion-characteristic feature extraction techniques; (4) Models of emotion creation technique; and (5) emotion recognition methods based on multi-modal information fusion (Zhang et al. 2020). The following sections will give brief ideas about the areas required for emotion recognition application.

1.2.1 Models for emotion representations

To develop a standard for affective computing, it is crucial to define emotion or affect. Ekman developed the fundamental idea of emotions for the first time in the 1970s. Emotions are traditionally classified based on two models: the discrete and dimensional models of emotion.

1.2.1.1 Discrete model

The categorical emotion model, also known as the discrete emotion model, divides emotions into a small number of categories. The emotional wheel model by Plutchik and Ekman's six fundamental emotions as shown in Figs. 4 and 5, are two often used discrete emotion models. The society of emotion recognition experts generally accepts Ekman's fundamental emotion model and its variations. Usually, there are six fundamental emotions: anger, disgust, fear, happiness, sadness, and surprise (Ekman et al. 1993). These emotions apparently comprise other non-basic emotions including fatigue, anxiety, satisfaction, confusion, shyness, guilt, contempt and frustration. Each type of emotion has its own internal and exterior representations as well as physiological patterns. The following criteria were used to determine the six basic emotions: (1) Human instincts must be the source of basic emotions; (2) People can experience the same basic emotions when confronted with similar circumstances; (3) People convey the similar basic emotions using the same semantics; and (4) These fundamental emotions must have a consistent pattern of expression across all individuals. In various studies a mixture of these discrete emotions was used for various recognition models. In Peng et al. (2022a, b) five different emotional states of happy, sad, disgust, neutral, and fear were considered for recognition while in Fan et al. (2022) three emotion models, namely happy, calm and sad were recognised.

Fig. 4
figure 4

Ekman model of emotion

Fig. 5
figure 5

Plutchik’s model of emotions

Plutchik's wheel model, in contrast, takes into account eight fundamental emotions—joy, trust, fear, surprise, sadness, anticipation, anger, and disgust—as well as how these emotions link to each other. Depending on their respective intensity levels, stronger emotions occupy the centre of the wheel model, which is also known as the componential model, while the weaker emotions inhabit the extremes. For purposes of sentiment analysis, these separate emotions can generally be divided into three categories: positive, negative, and neutral (Wang et al. 2022a).

1.2.1.2 Dimensional model

Many studies have embraced the idea of a continuous multi-dimensional model to address the problems with discrete emotion models. According to dimensional models of emotion, a variety of psychological dimensions can be combined to accurately reflect different emotional states.

Two-dimensional model: Most dimensional models take valence and arousal into account. Arousal describes the intensity of the felt emotion, whereas valence describes the degree of “pleasantness” that is connected with an emotion. The two-dimensional model of emotion is shown in Fig. 6.

Fig. 6
figure 6

Two-dimensional model of emotions

Three-dimensional model: Despite having no trouble differentiating between positive and negative emotions, it is unable to distinguish between identical emotions in the 2D emotion space. The emotion model was expanded by Mehrabian (Bakker et al. 2014) from 2 to 3D. The dominance axis in the additional dimension, which ranges from submissive to dominating, represents a person's capacity for control over an emotion. For instance, the negative valence and high arousal of wrath and terror both fall inside this zone. The three-dimensional model is shown in Fig. 7.

Fig. 7
figure 7

Three-dimensional model of emotions

In order to improve recognition, more multiple dimension models should be made. To give emotion models more depth, additional criteria like liking and familiarity are also being employed. When introducing any stimulus to the subject, liking indicates how much the participant likes the stimulus, whereas familiarity indicates how well the participants are acquainted with the stimulus (Aadam et al. 2022). Due to ongoing improvements in emotion recognition techniques, a variety of emotion models are being developed, which is assisting in recognition of more distinct emotions produced by the human brain. Many studies are using both discrete and dimensional models for emotion recognition (Fan et al. 2022). Some examples of different emotional models are shown in Table 1.

Table 1 Different emotion models

1.2.2 Emotion elicitation

A critical stage in emotion detection based on physiological data is the ability to appropriately produce or evoke the emotional state of the subject, often known as emotional arousal. Emotions can be evoked using one of three main techniques. Initially, developing artificial environments to elicit feelings. People have a propensity to produce certain enduring feelings in the past. You can also get people to remember bits of their former experiences that have different emotional overtones in order to generate feelings. The issue with this method is that it cannot guarantee that the individual will produce the appropriate emotion and that the linked emotion's duration is determinable. Secondly, evoking feelings by showing videos, music, pictures, and other enticing content. This method of getting people to produce emotional states and objectively identify them is frequently used to elicit emotions. The individual must then engage in computer or video game play. Computer games have psychological benefits in addition to their physical ones. While employing brief films or clips, subjects simply observe and listen to the sounds of the area. On the other hand, participants in computer games really interact with the scenario rather than merely watching or observing the inputs. They embrace the game characters as role models, which similarly affects people's emotions.

The International Affective Picture System (IAPS) and the International Affective Digitized Sound System (IADS) are the most often used tools for evoking emotions (IAPS). Standard emotional stimuli are included in these datasets. It is useful in experiments because of this. 1200 photos total, divided into 20 groups of 60 photos each, make up the IAPS. Each image has a valence and an arousal value ascribed to it. The most recent version of IADS has 167 digitally recorded ordinary natural sounds that are divided into valence, dominance, and arousal categories. Participants annotated the dataset using the Self-Assessment Manikin system. On the other hand, the outcomes of emotive labelling of multimedia could not be transferable to more real-world or interactive contexts. In order to ensure the generalizability of BCI results, additional studies involving interactive emotional stimuli are required. To our knowledge, very few research have used interactive scenarios to elicit emotions, such as people playing games or utilising flight simulators.

1.2.3 Multimodal analysis of emotion

As can be seen in Fig. 8, there is a wide range of possible physical and physiological modalities for human emotion identification. Physical modalities comprise audio emotion recognition, text segment analysis, and visual emotion recognition. Again, we can break down visual emotion recognition into two distinct subfields: facial expression recognition and body gesture emotion recognition etc. The physiological modalities include EEG, fMRI, functional near-infrared spectroscopy (fNIRS), electrocardiography (ECG), photoplethysmography (PPG), Electromyography (EMG), Electrodermal activity (EDA), skin temperature and other signals such as eye movement, blood pressure, respiration, etc. These modalities can also be categorized by the measurement of emotion responses by different activities (Can et al. 2023) as shown in the Fig. 8. In order to get an improved understanding of an individual's emotional state, researchers in the field of emotion identification frequently mix multimodal physiological information. By combining data from multiple sources, we can build systems that are more reliable than any one source alone. There are three stages to the multimodal fusion process: the early, intermediate, and late stages. In order to combine the signals, early fusion and late fusion are frequently utilized (Kamble and Sengupta 2023).

Fig. 8
figure 8

Different emotion recognition modalities

  1. (1)

    Early fusion: This type of fusion occurs at the feature level by selecting the features from multiple signals and combining them to form a single input for feature extraction or classification. It is also known as feature level fusion. Some examples for feature level fusion are fusion of visual and audio modalities (Chen et al. 2018a, b), text and audio modalities (Priyasad et al. 2020), visual and audio and text modalities etc. (Mittal et al. 2020; Fabiano and Canavan 2019).

  2. (2)

    Intermediate fusion: This type of fusion uses feature extraction from various time periods to get over synchronization problems. Additionally, odds for defective cases can be statistically predicted by comparing the current instances to the past ones (Shin et al. 2017).

  3. (3)

    Late fusion: In this type of fusion, results from various classifiers are combined to produce a final result, frequently through voting. Since the classifiers can be trained independently on each modality, synchronization is not necessary. It is also known as decision level fusion (Yang and Lee 2019; Wang et al. 2015a, b).

However, the multimodal affective analysis can also be customized by combining several modalities. In hybrid fusion feature level fusion and decision level fusion can be combined to give more accurate results (Wang et al. 2022b).

1.3 Features of the proposed review

The table lists some of the surveys and reviews on various emotion recognition methods that have been published. We compare our papers to those that have already been published to determine whether the articles had key components (marked as ‘✓’ if discussed in Table 2) such as emotion models, ML and DL techniques, dataset discussion, applications. The Table 2 also include the review's objective.

Table 2 Summary of previous surveys and reviews

The primary goal of the suggested review is to provide a thorough and accurate analysis of research on EEG-Based Emotion Recognition and its applications in both medical and non-medical domains. The review's characteristics are:

  • This article offers several emotion models in both discrete and dimensional domains, in contrast to newly published review papers.

  • It includes an overview of multimodal analysis for emotion detection as well as a brief explanation of the benchmark datasets used for EEG-based emotion recognition.

  • Additionally, it illustrates a variety of uses in both medical and non-medical fields as well as potential research avenues.

This paper is organised as follows: Background information on brain-computer interaction, several models for evoking emotions and multimodal emotion analysis are provided in Sect. 1. Section 2 outlines the function of each brain region in the development of emotions, discusses EEG frequency bands and EEG features, and looks at how emotions and EEG data relate to one another. A summary of EEG signal acquisition, pre-processing, feature extraction, feature reduction and selection, classification, and performance evaluation for emotion recognition problems is given in Sect. 3 along with a description of the EEG-based emotion recognition model. Section 4 explains public databases of EEG signal for emotional information. Section 5 give background details on deep learning and machine learning methodologies along with research that examines these approaches to recognise human emotional states using EEG-based BCI. Section 6 gives the summary of the findings with a brief overview of the work. Applications of EEG-based emotion recognition are summarised in Sect. 7. Sections 8, 9, and 10 will explore unresolved problems, challenges, and upcoming research paths and the research review is concluded in Sect. 11.

1.3.1 Papers selection method

This review of the role of machine learning and deep learning in EEG-based emotion recognition used a method called PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), which is a protocol for conducting systematic reviews and meta-analyses (Page et al. 2021). PRISMA was used to locate research and cut down the data gathering for this review, as shown in Fig. 9.

Fig. 9
figure 9

PRISMA flow diagram of the systematic review process

The databases of Springer, Web of Science, IEEE, Elsevier, and Google Scholar were searched on January 3, 2022, using the following set of keywords: EEG AND emotion, ('Deep learning' or 'machine learning' or 'Deep machine learning') in emotion recognition, ('EEG' OR 'Electroencephalography') signal in emotion, etc. Studies that did not meet the inclusion criteria (listed below) and duplicate entries in these databases were eliminated. The remaining studies' full texts were then read through.

Unqualified studies were excluded based on the following standards:

  • Electroencephalography only—Studies using multi-model datasets were disregarded in order to lower study variability.

  • Task classification—This review concentrated on how EEG signals are used by humans to accomplish recognition and classification tasks. Other research was not included, including power analyses and non-human investigations.

  • Time—Only studies published within the last five years were considered in this evaluation due to the rapid advancement of this field's research.

405 articles from the first phase were gathered. As a result, as part of the initial pre-screening procedure, duplicate articles, unreviewed papers, and papers not written in English were disqualified from consideration. Each paper that made it beyond the screening stage underwent a thorough full-text review. Additional publications were eliminated in a final screening based on the following exclusion criteria: There is a shortage of attention paid to emotion recognition, as well as recent ML and DL studies and extensive information, background knowledge on emotion recognition, and attention paid to EEG-based emotion recognition. Thus, 209 papers were still available and were taken into account in the review analysis.

2 Overview of EEG signal and emotion

2.1 Structure and functions of human brain

In scientific research the human brain is studied as the most complicated organ of human body. The greatest portion of the brain, the cerebral cortex, which is in charge of higher-order functions including language, memory, learning, emotions, decision-making, intelligence, etc., contains a type of bioelectric signal called brain electrical signal, with an amplitude between 10 and 100 V. This cerebral cortex is divided into two hemispheres and each hemisphere is again divided into four lobes namely frontal, parietal, temporal, and occipital lobes as shown in Fig. 10. Cognitive reasoning and emotional requirements make up the frontal lobe's main functions. The parietal lobe reacts to human tactile sensation and is linked to balance and coordination in the human body. The temporal lobe is primarily involved in auditory and scent, as well as emotional and mental processes. And finally, the domain of the occipital lobe involves the processing of visual information (Wang and Wang 2021).

Fig. 10
figure 10

Different brain lobes (Alarcão and Fonseca 2019)

2.2 Basics of EEG

A typical human brain generates an electric current of only a few microvolts. These voltage changes are brought on by ionic current that travels between the brain and the neurons. It takes almost 20–40 min to observe this random activity of brain. Due to this activity, EEG signals are produced (Gurrala et al. 2020).

EEG is one of the best tools for observing brain activity, also referred to as brain waves. In 1875 Richard Canton discovered the first electrical signals in animal brains, and Hans burger utilized this discovery as the ground work for the first ever recording of a human EEG signal in 1929 and subsequently published a paper. Following that, electrophysiologists and neurophysiologists eventually verified the outcomes of his research, leading EEG research in medicine and brain science to advance quickly. By examining the EEG waves, it is possible to comprehend how emotion varies. Neuronal potentials can be used to analyse the physiological and functional changes in the central nervous system (CNS). The electrical activity of a group of neurons in the region of the brain where the EEG recording electrode is placed is represented by the EEG; therefore, it includes a wide array of relevant and significant psychophysiological data. In medicine, neuroengineering, EEG signals provide diagnosing of certain diseases and disabilities for patients through its processing, classification and analysis techniques (Zhang et al. 2020).


Characteristics and Frequency Components of EEG Signal


EEG signal serves as a direct mirror of brain activity and is crucial for understanding the physiological processes that occur in the human brain. The following are its primary attributes:

  1. (1)

    Noise: The EEG recordings are typically noisy and sensitive to disturbances from the environment. Typically, the EEG signal seems to have moderate amplitude (from 50 to 100 μV). The EEG signals are frequently masked by noise, distortion, artefacts, and other signals (including EOG, EMG, and ECG).

  2. (2)

    Non-linear: Other peripheral physiological signals that are present during the recording of EEG signals typically affect the potentials in the EEG. Due to the physiological modification or reaction of human tissues, EEG signals are extremely nonlinear.

  3. (3)

    Nonstationary: EEG signal fluctuations are unpredictable, prone to influences in the surrounding environment, and show a significant non-stationarity feature.

  4. (4)

    Frequency-domain characteristics: The typical frequency band of EEG signals is 0.5–100 Hz; however, the low frequency band of 0.5–30 Hz is the one that is most important for recognition. It is typically divided into five frequency bands, each of which related to a distinct brain performance.

EEG signals are often divided into two groups: evoked and spontaneous. The spontaneous EEG is a periodic potential variation that the neurological system creates on its own, independent of any outside stimulation. Evoked potentials are measurable changes in the cerebral cortex's electrical potential caused by external stimulus of a person's sensory organs. The rhythm of the EEG signals is due to continuous discharge of brain cells. As shown in Fig. 11, the EEG signal can be split into five different frequency bands.

Fig. 11
figure 11

The waveforms of five EEG rhythms

Numerous studies on EEG signals have demonstrated that certain bands in the signal are closely associated with particular functions, as well as how difference in these frequencies is the foundation for the diagnosis of specific disorders and diseases. Table 3 gives a short description of five frequency bands (Kawala-Sterniuk et al. 2021).

Table 3 Frequency range and description of EEG signals

2.3 EEG-based BCI in emotion recognition

When compared to other peripheral neuro-physiological signals, EEG is able to detect changes in brain activity accurately, providing details about interior emotional states. Additionally, EEG with a high temporal resolution enables the monitoring of an emotional state in real time. So many EEG-based emotion recognition techniques have lately been created (Wang et al. 2022a).

To educate the machine to interpret and recognise emotions, we must first understand the physical sources of the emotions in our bodies. Emotions can be communicated verbally via well-known terms, or non-verbally via changes in voice tone or body language, or even physiologically, through our nervous system. But facial expressions and voice are not reliable indicators of emotion because they can be manipulated and cannot be taken as the outcome of a particular mood, whereas the physiological signals are more precise because the user has no influence over them.

Physiological changes are the basic causes of emotion in our bodies. There are two types of physiological changes: those that impact the peripheral nervous system (PNS) and those that affect the CNS. The CNS is made up of the spinal cord and brain. Various behaviours and emotions are caused by changes in electrical activity in human brain, which can be detected by an EEG (Houssein et al. 2022). Physiological signals like EEG contain a wealth of useful information about many emotional states of the brain. It is a particularly useful tool for comprehending human emotional states because it reacts more rapidly and accurately to changes in affective states. In EEG, the low frequency zone stimulates emotions more strongly than high frequency zones (Wang and Wang 2021). Happiness, sadness, and fear all exhibit considerably different average values for Beta, Alpha, and Theta waves on the midline of the brain as midline power spectrum is used as an important tool in classifying emotions (Zhao et al. 2018). According to physiological study, the cerebral cortex has a substantial impact on humans' higher emotional and cognitive capabilities. It may be possible to determine the brain areas that are strongly associated with emotion by using EEG-based emotion detection. According to some studies, it has been stated that, there is a connection between emotional states and particular brain regions. The left frontal regions of the brain are stimulated by enjoyment, according to Ekman and Davidson (1993). Increased theta band power in the frontal midline is correlated with positive emotions, whereas negative emotions are correlated with the reverse. These investigations demonstrate a relationship between emotional changes and the properties of the related EEG signals, which is highly relevant for the study of EEG signal emotion classification. Additionally, it provides a neurophysiological basis for identifying emotions in EEG data.

3 EEG based BCI emotion recognition methodology

The process of recognising emotions using EEG signals, can be broken down into the following components.: Emotional induction is the first step, followed by signal acquisition, pre-processing, signal extraction, and feature selection and emotional pattern learning and classification as shown in Fig. 12. Following that, the discussion of each step is written in an orderly manner.

Fig. 12
figure 12

Emotion recognition steps

3.1 Signal acquisition

EEG signal acquisition can be performed by both invasive and non-invasive methods. In invasive method, the signal to noise ratio and signal strength and accuracy are high compared to non-invasive approaches, but the drawback is that requires surgical implant into the skull cavity and the electrodes enter the cerebral cortex to acquire signals. Therefore, non-invasive methods are most commonly used as these are affordable and signal acquisition can be done easily with the help of wearable EEG caps or headsets that place electrodes along the scalp. In order to store and analyse signals, EEG electrodes acquire, amplify, and transfer them to a computer (or mobile device). There are many inexpensive EEG-based BCI devices on the market right now. However, with prolonged use, many of the current EEG-based BCI systems become difficult to use or users may find it an uneasy experience. Therefore, their efficiency needs improving.

3.1.1 Acquisition equipment

The EEG acquisition equipment is varied in the emotional EEG experimental study due to various practical requirements. Table 4 represents various EEG acquisition equipment available in the market. Biosemi Active Two, Neuroscan Quik-Cap and Emotiv EPOC are examples of popular acquisition devices.

Table 4 Various EEG recording devices

The Biosemi Active Two is a high-resolution, second-generation EEG monitoring device. In terms of sampling rate, bandwidth, and common mode rejection ratio, it is the industry leader. The signal quality is unaffected by the high electrode impedance underneath the active electrode. As a result, there is absolutely no need to prepare the skin, making the experiment's operation more convenient and efficient.

Emotive EPOC is a non-implantable electrode device having 16 sensors inside. While wearing, the posture should be properly set, and there should be no excess movements. The Neuroscan's QuikCap electrode cap, which uses a conventional EEG electrode placement method, is simple to use and suitable. For many emotion-related EEG investigations, a varied sampling rate is used. 1000, 128, 512 Hz, and more frequencies are standard sampling rates (Wang and Wang 2021).

3.1.2 Electrodes distribution

According to several studies, the prefrontal lobe, temporal lobe edge, and posterior occipital lobe are where the majority of emotion-related EEG electrodes are located. These areas are perfectly matched with the physical role for emotion generation. The derived feature dimension can be drastically lowered, the experiment can be made simpler, and the calculation's complexity can be reduced by choosing electrode distribution.

During recording the International 10–20 electrode system is used to apply various electrodes (dry, wet, gel, etc.) to the scalp. Later on, this system is modified to create10-10 electrode system for better efficiency as shown in Fig. 13. The electrodes in this system are positioned systematically and given names based on the areas of the brain they cover. Considering the electrodes' names, frontopolar, anterior frontal, frontal, frontocentral, temporal, parietal, and occipital are represented by FP, AF, F, FC, T, P, and O, respectively. The left hemisphere is denoted by an odd number suffix, while the right hemisphere is denoted by an even number.

Fig. 13
figure 13

Electrode positions and labels in the 10–10 system (Hart n.d.)

3.1.3 Normalisation

Depending on factors including variations in people' alertness during the day, age, sex, etc.; the amplitude of EEG signals is affected. Therefore, to compensate this variation, measured values must be normalised. There are three approaches to normalise the feature data as given below:

  1. (1)

    The first method involves recording baseline signals, in which the individual is either not subjected to any stimuli or is only introduced to simple, calming stimuli. Then, features obtained under different circumstances (where the subject is engaged in a task) are normalised by subtracting the baseline value or dividing by the baseline value, or by combining these methods.

  2. (2)

    In the second method, the features extracted from baseline data are added to the feature space as independent features, doubling the feature space's dimension. This strategy is known as “baseline matrix”.

  3. (3)

    The third method basically involves converting data from each subject individually or from all subjects together to a specific range (for example, from 0 to 1 or from −1 to 1). In this manner each feature is treated separately (Novak et al. 2012).

3.2 Signal pre-processing

In EEG signals, Pre-processing refers to signal enhancing and cleaning. EEG signals are weak and readily affected by internal and external noise. Therefore, pre-processing steps are necessary to prevent noise contamination that can impact further classification. Various artefacts removal and filtering techniques are used to give pre-processed EEG signals as discussed in Table 5.

Table 5 Various signal pre-processing techniques

Artefacts Removal: Any part of the brain may produce artefacts due to other biological signals or outside interferences. During signal acquisition, by blinking (EOG), moving the eyes or muscles (EMG), or beating the heart (ECG), or even getting affected by any external source, some electric signals are generated by the body which mix with EEG data. These extra signals are called artefacts. It can be challenging to distinguish artefacts from the EEG data since their amplitudes could be similar. In order to generate an EEG signal that is free of artefacts and can be used to extract accurate characteristics, the noise in the signal must be removed or attenuated (Agustina Garcés and Orosco 2018).

Independent component analysis (ICA), principal component analysis (PCA), common spatial patterns (CSP) and common average reference (CAR) are popular pre-processing techniques that have been used in several studies. Brief description about these methods is shown in the table below. Among all the artefact removal techniques, ICA is most commonly used as it is very effective.

Filtering: Frequency domain filters can be used to eliminate artefacts in the recorded EEG signals by reducing the bandwidth of the EEG signal under analysis. These filters are created so as to not modify or distort the signals in any manner. Some of the most popular filters are notch filters, Butterworth filters, high-frequency filters, low-frequency filters (also known as high-pass and low-pass filters). High-frequency filters and low-frequency filters are used to filter frequencies between 0.5 and 50–60 Hz. To eliminate the unsettling very low frequency components, such as those of breathing, high-pass filters with a cut-off frequency of typically less than 0.5 Hz are used. However, by applying low-pass filters with a cut-off frequency of roughly 50–60 Hz, high-frequency noise is reduced. Butterworth filters have a wide transition zone and a flat reaction in the stopband and passband. Notch filters are used to block the transmission of a single frequency instead of a range of frequencies. To guarantee perfect rejection of the potent 50 Hz power supply, notch filters with a null frequency of 50 Hz are required.

Baseline correction and removal: The baseline signal or pre-stimulus signal must often be corrected or removed during pre-processing before being compared to and analysed with the post-stimulus signal in many emotion identification applications as in Jiménez-Guarneros and Alejo-Eleuterio (2022). The sliding window principle must be used for baseline signal removal. Windowing techniques and sizes can be selected based on the needs of the research.

3.3 Feature extraction

The stage after pre-processing and noise reduction is feature extraction. The BCI must extract crucial features from the signals after they have been cleansed of noise so that they may be passed to the classifier. Finding information that can accurately reflect a person's emotional state is the main goal of feature extraction in the emotion recognition process utilising EEG data. Algorithms for classifying emotions may then use this information. The extracted properties mostly control how accurately emotions are identified. Consequently, it is crucial to identify the key EEG characteristics of emotional states. In the time, frequency, and time–frequency domains, conventional EEG feature analysis are frequently carried out. EEG signals may be studied more thoroughly using nonlinear dynamics analysis since EEG data is nonlinear. The time, frequency, time–frequency, and nonlinear feature analyses are the four EEG feature analysis techniques covered in this section that are used to identify emotions. The feature extraction techniques applied in the studies covered by this study are shown in the Table 6.

Table 6 Different feature extraction methods

3.3.1 Time domain


Period domain analyses have been used in the study of brain activity for a very long time. Most EEG acquisition tools available today collect EEG data in the time domain. There are numerous methods for analysing EEG in the time domain, including the event related potential (ERP), histogram analysis method, higher-order crossing (HOC) (Fan et al. 2022), PCA (Goshvarpour and Goshvarpour 2023), ICA (Chen et al. 2020a, b), and Higuchi's fractal dimensions (FD) (Liu and Sourina 2014), Hjorth parameters which serve as a gauge of self-similarity and complexity of the signals in this domain. These methods depend on the extraction of time-based features. Statistical features including mean, power, maximum, minimum, median, standard deviation, kurtosis, skewness, relative band energy, variance etc. Time domain analysis begins with the geometric properties of EEG data, which the EEG analyser can accurately and intuitively statistically analyse. EEG data is included in the domain's features with little information lost. However, there is no standardised technique for analysing the time-domain aspects of EEG signals due to the complex waveform of EEG data. EEG analysers must therefore possess substantial experience and understanding.

$${\text{Power: }}{P}_{x}=\frac{1}{T}\sum\limits_{t=1}^{T}{|x\left(t\right)|}^{2}$$
(1)
$${\text{Mean: }}{\mu }_{x}=\frac{1}{T}\sum\limits_{t=1}^{T}x\left(t\right)$$
(2)
$${\text{Standard deviation: }\sigma }_{x}=\sqrt{\frac{1}{T-1}\sum\limits_{t=1}^{T}{(x\left(t\right)-{\mu }_{x})}^{2}}$$
(3)
$${\text{1st\;difference: }\delta }_{x}=\frac{1}{T-1}\sqrt{\sum\limits_{t=1}^{T}|(x\left(t+1\right)-x(t)|}$$
(4)
$$\text{Normalized 1st difference: }\overline{{\delta }_{x}}=\frac{{\delta }_{x}}{{\sigma }_{x}}$$
(5)
$$2\text{nd difference: } {\gamma }_{x}=\frac{1}{T-2}\sqrt{\sum\limits_{t=1}^{T-2}|(x\left(t+2\right)-x(t)|}$$
(6)
$$\text{Normalized 2nd difference: }\overline{{\gamma }_{x}}=\frac{{\gamma }_{x}}{{\sigma }_{x}}$$
(7)

3.3.2 Frequency domain


It has been shown that features in the frequency domain work better for automatic emotion identification with EEG than features in the time-domain. In order to assess and extract frequency domain properties, frequency domain analysis techniques transform time-domain EEG signals into frequency domain signals. The EEG signal is often broken down into distinct sub-bands, and characteristics including power spectral density (PSD), logarithm energy spectrum, higher-order spectrum (HOS), and differential entropy (DE) are retrieved for study. Applying the fast Fourier transform (FFT) straight to a brief EEG segment is the most used technique for performing frequency analysis.

3.3.3 Time–frequency domain


By combining data from the time and frequency domains, the time–frequency domain analysis technique enables localised time–frequency domain analysis. The ability to collect time-varying and non-stationary signals, which can be utilised to characterise different emotional states, is made possible by time–frequency-domain characteristics. The Wavelet transform (Liu and Fu 2021) is the method of time–frequency analysis that is most frequently utilised. Other crucial time–frequency domain analysis techniques include wavelet packet transform (WPT), and short-time Fourier transform (STFT) (Lin et al. 2010) and many more.

3.3.4 Non-linear


Nonlinear dynamic analysis can be used to study highly complicated nonlinear and nonperiodic properties of EEG data. Numerous nonlinear analysis techniques have gained popularity in recent years for the study of EEG data. Permutation entropy, approximation entropy (Wang et al. 2022a, b, c), power spectrum entropy, and sample entropy (Zhang et al. 2016) are examples of nonlinear dynamic approaches.

3.4 Feature selection and reduction

The feature selection or reduction method is essential for EEG-based emotion recognition. The feature vectors in a BCI system are frequently very large. As a result, strategies for feature selection and/or reduction are routinely employed to reduce the number of features. These methods reduce the complexity of the problem by providing a classifier only features that contain important information. The efficiency and precision of model training can both be improved by using an appropriate feature selection and reduction approach. The Table 7 shows a few feature selection strategies using emotion models.

Table 7 Different feature selection methods

Principal Component Analysis (Goshvarpour and Goshvarpour 2023), Linear Discriminant Analysis (Liu et al. 2018) are some prominent techniques for reducing EEG features. In less-dimensional space, PCA tries to represent d-dimensional data. This will reduce both the variety of options and the difficulties of time and space. Using LDA, a new variable that includes the initial predictors is created. By maximising the differences in the new variable between the predefined groups, this is accomplished. The discriminant score is a new composite variable that is created by combining the prediction scores.


Minimal redundancy maximal relevance


Using mutual information, the MRMR method evaluates the relevance of features to target classes or other features in the feature space. It is founded on the maximisation of relevance and the minimization of redundancy. Here's how maximum relevance is defined:

$$D=\frac{1}{{|s|}^{2}}\sum\limits_{{x}_{i}\in S}I({x}_{i},c)$$
(8)

where S stands for feature set and \(I\left({x}_{i},c\right)\) for mutual information between feature i and target class c. Following is how minimum redundancy among features is calculated:

$$R=\frac{1}{{|s|}^{2}}\sum\limits_{{x}_{i},{x}_{j}\in S}I({x}_{i},{x}_{j})$$
(9)

where \(I({x}_{i},{x}_{j})\) stands for mutual information between feature i and j.

Combining Eqs. 8 and 9 produces the feature selection criterion for the mRMR method:

$$max(D-R)$$
(10)

3.5 Classification

The main task in emotion recognition is to categorise the input signals into one of the available class sets. Finding the optimal classifier that can correctly classify a variety of emotions is one of the most important steps in creating a successful emotion classification system. In order to determine the true class of an unknown observations in a validation dataset, a classifier uses a mathematical function. To categorise affective EEG data, a variety of classification techniques have been used in the affective computing field. These classifiers include basic ones (basic machine learning algorithms) like support vector machine (Zhang et al. 2016), decision trees (Li et al. 2022a, b, c, d) and linear discriminant analysis, as well as more sophisticated classifiers (deep learning algorithms) like recurrent neural networks and long short-term memory. K-nearest neighbour (KNN) (Mehmood and Lee 2016), Random Forest (RF) (Zhang et al. 2021) are a few more classification models that are appropriate for emotion recognition.

3.6 Model assessment and selection

3.6.1 Evaluation Method


By conducting tests to assess the classifier's capacity to categorise new samples, it is possible to estimate the generalisation error of the classifier. The testing error on the testing set can be roughly equated to the generalisation error.

3.6.2 Hold-out method


There are two sets that are mutually exclusive within a dataset D. The training set S is one, while the testing set T is the other. It's important to keep the data distribution as consistent as possible. In most cases, the experiments require multiple runs of random division before the average value is computed as the evaluation result. 20–30% of data of the dataset are usually used for training and the remaining samples are used for testing.

3.6.3 Cross-validation method


Two popular types of cross-validation techniques are used. K-fold crossing-validation is one example and Leave-one-out is another. For k-fold cross-validation, the initial sampling is split into K sub-samples. The other K-1 samples are used for training, while one sub-sample serves as the testing set. The process of cross-validation is performed K times. Every subsample is validated once, and the final result is calculated using the average of K validations. The most used technique is tenfold cross-validation. In leave-one-out (LOO), the remaining samples are used as training sets and one of the primary samples is used as a testing set. Even though LOO produces more precise results, training takes too much time when the dataset is big.

3.6.4 Performance evaluation parameters


To understand and compare with other study groups, the results for emotion recognition must be conveyed consistently for understanding and comparing with various study groups. Therefore, it is essential to properly choose and define evaluation procedures. Confusion matrix and accuracy are the most suggested performance evaluation metrics for measuring the effectiveness of the emotion classifiers.

A confusion matrix showed in Table 8 predicts the number of correctly identified and misidentified points during classifier training. Six classification performance measures—accuracy, specificity, recall (sensitivity), precision, F-measure, and area under the curve (AUC)—are often generated based on the confusion matrix shown in the Table 9. These metrics are generally computed based on the four main metrics of a binary classification result, True Positive (TP) and True Negative (TN) indicates the predicted value matches the actual value whereas False Positive (FP) and False Negative (FN) indicates the predicted value was falsely predicted and does not match the actual value.

Table 8 Confusion matrix
Table 9 Performance evaluation metrics

4 Datasets

There are various emotion databases that are openly accessible for anybody to download and analyse without the requirement for permission from anyone or the involvement of an organisation in their work. Those available data sets are listed below in the Table 10. Among all, DEAP and SEED datasets are most commonly employed for research work. Figure 14 shows the pie chart of various EEG datasets used in the papers discussed in the review.

Table 10 Details of available dataset
Fig. 14
figure 14

Pie chart of EEG datasets used in the papers discussed in the review

DEAP: The DEAP database for analysing human emotions includes a 32-channel EEG and 12 other peripheral physiological signals, including 4 EMG (electromyogram), 1 RSP (respiration pattern), 1 GSR (galvanic skin response), 4 EOG (electrooculogram), 1 P (Plethysmograph for blood volume pressure), and 1 T (skin temperature). Pre-processing procedures were carried out after the data was gathered at a sample rate of 512 Hz. The signals had their samples down to 128 Hz. By using a bandpass frequency filter on the EEG channels between 4 and 45 Hz, the EOG artefacts were removed from the data. 32 people were included in the database. Each participant saw 40 music videos, each lasting one minute and with a different emotional theme. They gave the videos arousal, valence, dominance, liking and familiarity ratings after each trial. Arousal and valence ratings were measured by self-assessment manikins, and thumbs-up and thumbs-down responses were used for liking rating, which indicated how much the individual liked the video (Koelstra et al. 2012).

SEED: While the participants watched the films, Zheng and Lu recorded the SJTU EEG dataset (SEED), a physiological dataset gathered from 15 people. Three categories of emotions—positive, negative, and natural—were classified in the dataset. After watching the videos, participants were required to complete a questionnaire. Three different sessions were used to record the EEG. The interval between sessions was at least a week, and the international standard method 10–20 was used to record the EEG signals (Duan et al. 2013).

MAHNOB-HCI: MAHNOB-HCI is a multimedia database. While they looked at 20 videos and images, 27 people provided data. EEG signals (32 channels), ECG signals (3 signals), ERG signals (2 channels), GSR signals (2 channels), respiratory capacity signals, and skin temperature signals were among the data gathered. This experiment was divided into two phases. The participants in the first session were invited to watch video clips, and afterward, they were asked to respond to a questionnaire regarding their feelings. The films and photographs were simultaneously shown twice during the second session, once with the correct and incorrect labels and once without (Soleymani et al. 2012).

GAMEEMO: The GAMEEMO database is an emotion dataset based on brain physiological signals (EEG). 28 participants took part in this dataset study at Firat University's Department of Software Engineering. The study's volunteers were 20–27 years old. It was not stated how many men and women made up the study's subjects. Four computer games were used as stimuli to extract the four fundamental feelings (boredom, calm, scary, and hilarious) from the participants for a 5-min period. Each subject had access to EEG data for a total of 20 min. With the help of wearable EMOTIV EPOC + Mobile EEG gadget, data were gathered across 14 channels. 38,252 samples were collected from each participant and for each game (Alakus et al. 2020).

5 Machine learning and deep learning in emotion recognition

The recognition of emotions is a classification or regression challenge, respectively. The emotional model that is employed to depict emotions forms the foundation of the distinction to depict feelings. In categorical representations, emotions are shown as discrete entities with labels. Contrary to discrete representations, dimensional models attempt to characterise emotions using continuous values of their defining characteristics, which are frequently depicted on axes. The majority of earlier methods approach emotion recognition as a classification problem. Emotion dimension regression benefits significantly less from the literature than classification of emotions in general. As a result, we focus on various machine learning and deep learning classification approaches in this part as shown in (Fig. 15).

Fig. 15
figure 15

Various ML and DL models used in emotion recognition

5.1 Machine learning in emotion recognition

Machine learning algorithms were employed to categorise various emotional states from EEG-based BCI in the systems that recognise emotions. Machine learning has been a key component of BCI's data analysis since it has helped distinguish between different brain activity patterns. Important facts and guidelines can be learned from the source task and then applied to the target task through machine learning. Additionally, machine learning algorithms can be used to analyse data that has been stored in a data management system in order to retrieve potentially crucial information. The final classification or prediction results might be greatly influenced by the machine learning method that is selected (Lv et al. 2021).

Machine learning models can be divided into two categories: supervised learning and unsupervised learning. Using training data, supervised machine learning can be used to determine the classifier's parameters. The learning task is to adjust the settings of the system for any valid input value after viewing the output value. To verify the effectiveness of a learned algorithm, a test dataset with data that has not been added to the model while it is learning is fed into the classifier. As opposed to supervised learning, unsupervised learning uses input data and a cost function that must be minimised to select parameters. A number of ML models have been put into practise recently for the management of the classification of EEG signals for the recognition of human emotion (Houssein et al. 2022). Among these methods are Support Vector Machine (Zhang et al. 2016), Naıve Bayes (Hinvest et al. 2022), k-nearest neighbour (Mehmood et al. 2016), Decision Trees (Li et al. 2022a, b, c, d), Random Forest (Zhang et al. 2021), and Artificial Neural Networks (Khubani and Kulkarni 2022), which are widely used as classification methods. In the following sections, we will provide a brief description of each and Table 11 provides details of some existing machine learning methods used for EEG based emotion recognition.

Table 11 Related studies using machine learning algorithms

Support Vector Machine (SVM): Supervised learning is used by SVM to divide the data into two groups. The classes are identified by discriminant hyperplanes. The best hyperplane in SVM is the one that is nearest to the training values. To boost performance, it makes use of a range of kernel functions, including linear, polynomial, and radial basis (Khaliq and Sivani 2022). In Zhang et al. (2016) SVM was used to identify emotions from DEAP dataset. Authors used EMD for signal decomposition and used sample entropy as a feature of the study. Binary and multiclass classification is performed using the scale of valence and arousal and obtained with accuracy of 94.98% and 93.20% respectively.

K-Nearest Neighbor (KNN): Depending on the weights, K-NN is a relatively simple algorithm to learn and put into practise. Keeping track of all training sets takes more time and space. As a nonlinear classifier, KNN accurately assesses the decision boundary, but it also causes overfitting and limits the scope of possible generalisation. Wang and Mo (2013) found that a mean recognition rate of 82% could be achieved by combining a K-NN (K = 4) classifier with a feature selection Tabu search heuristic algorithm and a fourfold cross-validation approach to categorise four emotions (happiness, sorrow, pleasure, anger).

Linear Discriminant Analysis (LDA): As a linear classifier, LDA assigns feature values to the most recent subspace to establish group affiliation. The categorization process requires no more parameters. Scattering matrices must be non-singular for LDA to be applicable. Thus, Pseudoinverse LDA (pLDA) was employed to get around this restriction. Three individuals' emotional states were correctly classified at the 95.5% level using SBS and pLDA in Kim and André (2008)

Random Forest: Random Forest (RF) (Breiman 2001) is a complex ensemble method that employs decision trees for classification and regression during training. It is based on the bagging algorithm. This method handles enormous volumes of data by using a limited number of attributes to create decision trees. Training time is significantly reduced compared to other classifiers (Ayata et al. 2017). Random Forest is a common categorization method with these qualities. A step-by-step RF working model is provided below:

  • Training sets are randomly selected and equal in size to the sample set.

  • Decision trees are built from each training set.

  • Randomly select a collection of attributes with equal likelihood, then select the best attribute to separate nodes.

  • Each decision tree predicts.

  • Every anticipated result gets a vote.

  • The conclusion is based on the most voted results.

Decision tree: Decision tree (DT) is a popular machine-learning technique for regression and classification. The method involves dividing a data set into subsets based on a criterion that maximizes separation and then creating a tree (Loh 2011). The most prevalent criterion is information gain, which maximises entropy reduction from splits. Decision tree leaf nodes have class labels, whereas nonterminal nodes like the root and internal nodes include attribute testing requirements to differentiate records with unique properties (Bastos et al. 2020). Decision trees aren't black-box models and may be stated as rules, unlike other machine learning methods. This advantage is more significant in many application domains, making these models popular.

Artificial neural network: Artificial neural networks (ANNs) are a machine learning technology that simulates the human brain. Like neurons in the human nervous system, ANNs may learn from past data and respond with classifications or predictions. The system consists of artificial neurons or nodes and their connections. The impact of one unit on the other is assessed by weighing the relationship between two units. Various units function as input, hidden, and output nodes, performing summation and thresholding (Basheer and Hajmeer 2000). Neural networks consist of three layers: input, hidden, and output, as depicted in Fig. 16.

Fig. 16
figure 16

Architecture of ANN

The initial layer of an ANN accepts data such as numbers, words, image pixels, and audio recordings. Hidden layers are distributed throughout the ANN model. Hidden layers process input data to perform mathematical computations and identify patterns. The output layer displays the outcome of the middle layer's rigorous computations (Fausett 2005). A neural network's performance depends on several parameters and hyper-parameters. ANN output is mostly controlled by parameters like as weights, biases, batch size, learning rate etc. The literature describes numerous neural network types and designs, each with a distinct learning process (Basheer and Hajmeer 2000; Fausett 2005; Sharma et al. 2020a, b).

5.2 Deep learning for emotion recognition

A subset of machine learning and artificial intelligence called deep learning (DL) which is capable of learning from the provided data (Dong et al. 2021). DL can produce considerable results in various classification and regression problems and datasets. With applications in healthcare, visual identification, text analytics, cybersecurity, and many other areas, it has gained popularity in the computing industry (Sarker 2021). In order to accomplish multiple levels of nonlinear operations, DL makes use of various hidden layers in neural networks. In a classification challenge, complicated functions can be taught to recognise output classes using a variety of modifications and numerous hidden layers. Even though DL approaches for automated emotion recognition are relatively new compared to the long history of emotion research in psychophysiology, a number of articles on their use have lately been published.

Feature extraction and feature selection reduce the size of the feature set in order to increase classification performance and reduce computation time. There are two forms of feature extraction: shallow and deep. Shallow features are those that have been manually created in various analytic domains, such as time domain, frequency domain, and time–frequency domain. Numerous feature selection or reduction techniques are employed to decrease higher-dimensional features, as mentioned in the section above. Unfortunately, shallow features heavily rely on hypotheses and demand a substantial amount of labelled data, both of which can be challenging to acquire in the context of real-world applications. Even though manual feature extraction and selection are typically time-consuming and tiresome, it has a major impact on the performance of machine learning models. Hand-crafted shallow features are usually domain-specific, making it challenging to reuse them in different problems. Time series data with several variables might find it difficult to extract complex and nonlinear patterns using traditional feature engineering and machine learning techniques. Additionally, selecting the most critical characteristics from a large feature collection is essential and calls for the application of dimensionality reduction techniques. Furthermore, computing feature extraction and selection requires a lot of time. For instance, when feature dimensionality increases, the cost of computing feature selection may increase exponentially. The best feature set for a specific ML model may not always be found via search methods. To overcome the difficulties of obtaining usable and reliable features from time series data, several researchers have concentrated on DL approaches. For ML algorithms, DL makes it simpler to extract manually created features. Instead, it has the potential to learn the feature's hierarchical representation on its own. This eliminates the need for feature space reconstruction and data pre-processing in a standard machine learning pipeline. Deep learning is based on artificial neural networks, where “deep” refers to the number of layers in a neural network. In DL approaches, deep neural networks are used to extract pertinent features by using high-level data representation. An enticing feature of DL techniques is their capacity to working with raw data and automating feature extraction and selection. The network is fed with time series samples, and with each nonlinear transformation, a hidden representation of the inputs from the layer before it is created, resulting in a hierarchical data representation structure. Each layer uses a nonlinear mapping to translate the results from the layer before it into a new feature set in the deep network model.

CNN (Ahmed et al. 2022), autoencoder, DBN (Zheng et al. 2014), RNN, and NLP (Hollenstein et al. 2021) are just a few of the applications where deep learning methods have recently had a significant impact. Different deep architectural models are put forth, applied to EEG signals, and the findings produced that are comparable to those of other traditional techniques. Recently, DL has been used to build reconfigurable emotion recognition systems because of its capacity to offer high-level data abstraction. Many DL models have been put into practise recently to control the classification of EEG signals for the identification of human emotions. These techniques include CNN, which are frequently used as classification methods, and RNN, which have LSTM networks as a special type. In the following sections, we will provide a brief description of each Table 12 provides details of some existing deep learning methods used for EEG based emotion recognition.

Table 12 Related studies using deep learning algorithms

Convolutional neural network: The convolutional neural network is a deep, feed-forward artificial neural network. The three layers that make up a CNN are (i) an input layer, (ii) several hidden layers, and (iii) an output layer. The neural framework of a CNN is built using trainable weights and biases. Each neuron takes in data and then uses non-linearity to carry out a dot product. Hidden layers show a sequence of convolutional layers that multiply or otherwise dot-product convolve. The network complexity in CNN may be reduced, and the good place image domain can be secured with the help of sharing weights and local connections (Kamble and Sengupta 2023).

Long short-term memory network: LSTMs are a class of recurrent neural networks (RNNs) with a distinct design. In 1997, Hochreiter and Schmidhuber introduced it to alleviate long-term reliance on RNNs (Hochreiter and Schmidhuber 1997). Learning extended sequences with typical RNNs using backpropagation through time (BPTT) can be hard, resulting in the vanishing/exploding gradient problem (Guo 2013; Hochreiter et al. 2001). The RNN cell is replaced with a gated cell, such an LSTM cell, to address this issue. The memory block and gates in LSTM cells enable information to pass via the link. There are multiple connections to and from these gates. The network's temporal state is kept in self-connected memory cells, and information flow is regulated by gates in memory blocks (Hochreiter and Schmidhuber 1997). Initially, memory blocks had three gates: input, forget, and output. The first gate, a forget gate, employs a sigmoid layer to determine which cell state information to eliminate. The input gate, the second gate, uses a sigmoid layer to identify updated values and a tanh layer to generate a vector of modified values. Last, the output of the current state will be calculated using the sigmoid layer and the recently modified cell state. The sigmoid layer identifies final cell state features.

Recurrent neural network: Recurrent neural networks (RNNs) are used in deep learning to process variable-length sequential data, such as time series data, sound, or natural language. It consists of continuously connected feedforward neural networks. The system uses temporal correlations to represent input history and predict outputs within the network. In typical neural networks, inputs and outputs are assumed to be independent. Using cyclic connections, RNNs can learn sequential data over time. Dynamic temporal patterns can be captured and saved in RNN networks due to internal feedback loops in each hidden layer. The hidden layer of an RNN consists of several nodes that generate outputs based on current inputs and previous hidden states. RNNs can be trained using the backpropagation through time (BPTT) algorithm (Guo 2013). Training RNNs can be challenging due to increasing gradient and vanishing difficulties, making it challenging to back-propagate gradients over extended time intervals (Hochreiter et al. 2001; Reddy and Delen 2018). This limits access to essential context for sequencing data. Long short-term memory (LSTM) and gated recurrent unit (GRU) have become popular alternatives.

Probabilistic neural network: PNNs are a form of deep, feedforward neural network that employs a Bayesian approach. PNN's high accuracy and noise/error tolerance come from its straightforward design and high learning potential compared to SVM. In a study using PNN for EEG-based emotion recognition with sub-band power features extraction, Zhang et al. (2017a, b) found that while PNN uses less channels in order to accomplish an identical outcome to that of using SVM, it gives a slightly lower classification rate than that of the SVM, with arousal (PNN-81.69%, SVM-82.26%) and valence (PNN-82.41%, SVM-82.67%) serving as examples. Also its is found that PNN used only 9 channels for valence where as SVM used19 channels) and to measure arousal PNN used 8 channels whereas, SVM used 14 channels.

Fig. 17
figure 17

A graphical model of DBN (Zheng et al. 2014)

Deep belief network: A deep belief network (DBN) is a specific kind of neural network that can be thought of as a probabilistic generative model or a generative graphical model. It is made up of multiple layers of latent variables (hidden units) that are connected to one another but not to each other at the unit level. This model is built by stacking a given number of restricted Boltzmann machines (RBMs), with the output of the lower-level RBMs feeding into the high-level RBM, as shown in Fig. 17.

Fig. 18
figure 18

Applications of EEG-based emotion recognition

In order to separate happy and sad emotions in EEG data, Zheng et al. (2014) recently presented the improved DBN with extra differential entropy (DE) properties. When a hidden Markov model (HMM) was incorporated, accuracy increased to an average of 87.62%. Zheng's findings from experiments show in Zheng and Lu (2015) that the DBN classifier outperforms the SVM classifier and the K-NN classifier at distinguishing between the three classes of emotions (positive, negative, and neutral).

6 Summary of the findings

We reviewed more than 200 publications for this article, not only covering the cutting-edge emotion identification methods that have recently been presented but also taking into account the datasets that are now accessible and outlining the key components of a data-driven emotion recognition pipeline. We list some of the key conclusions we came to from this survey in this section.

The subject of human–computer interaction has seen a surge in interest in the study of emotion recognition due to the development of automation and human–machine systems technologies. This research examines methods for EEG-based emotion recognition. EEG reacts immediately to emotional changes. It is possible to extract, reduce, and then classify emotions using EEG characteristics. The following computing processes make up the general process of EEG-based emotion recognition: data collecting, data preprocessing, feature extraction, feature dimensionality reduction, and classification. Complete review of emotion recognition pipeline are the main topics of this study. The following is a summary of this paper's main findings:

  1. 1.

    Each dimension of emotion was only given a binary classification in a lot of literature.

  2. 2.

    Traditionally, a predetermined threshold of the subjective rating data is used to determine the labelling of the actual/true/target emotion groups. Unfortunately, choosing the right threshold can be challenging. Using data clustering techniques, a novel approach is to concurrently examine the valence and arousal dimensions in order to identify the target classes of emotion.

  3. 3.

    Including baseline EEG information: In numerous investigations on emotion recognition, the baseline EEG data was ignored in favour of the EEG data collected under various emotional situations.

  4. 4.

    An evaluation of various data gathering tools and pre-processing methods for EEG signals is done in order to identify emotions. Various publicly accessible datasets are briefly reviewed, and it is discovered that the DEAP and SEED datasets are used for the majority of the research.

  5. 5.

    Reviews are given of several feature extraction techniques as well as time, frequency, time–frequency, and wavelet features. It is discovered that when recognising emotion using EEG data, entropy-based traits are more important.

  6. 6.

    A comparison of several dimensionality reduction strategies is done. We discovered that the most effective feature dimensionality reduction methodology changes depending on the type of feature extraction used.

  7. 7.

    Various machine learning-based classifiers, including KNN, NB, SVM, and RF, are discussed. It is discovered that SVM and RF outperform KNN and NB for the task of emotion recognition based on EEG. Additionally, a variety of deep learning models, such as CNN, DBN, LSTM, RNN, and PNN, are reviewed.

7 Applications of EEG-based emotion recognition

Emotion recognition systems have many uses in various industries, including education (Antonenko et al. 2010), automobile (Katsis et al. 2015), healthcare (Huang et al. 2021), entertainment (Du et al. 2023), etc. Figure 18 shows some applications of emotion recognition. These systems can identify and interpret emotions by continuous real-time EEG signal monitoring, and they can modify their responses and activities as necessary. As indicated in Fig. 18, these applications can also be divided into two categories: medical applications and non-medical applications.

7.1 Medical Applications


Medical professionals use EEG-based emotion detection devices to assess, improve, and diagnose patients' emotions. These medically adapted EEG systems also diagnose neurodevelopmental disorders. Neurodevelopmental illnesses can affect memory, emotions, learning, behaviour, and communication. Autism, depression, schizophrenia, and many other mental and neurological disorders are the most common.

Doctors can more accurately evaluate a patient's physical state and consciousness by using computer-aided examination of their emotions. Most of the current research on emotion recognition uses offline analysis. Huang et al. (2021) used an online system to recognize emotions in patients with impairments of consciousness for the first time. They were successful in inducing and detecting the emotional traits of some patients with consciousness issues in real time using this method. These experimental findings demonstrated that affective BCI systems have a lot of promise for identifying emotions in people with consciousness disorders. In an earlier work, Bălan et al. (2021) used the DEAP dataset to construct an autonomous emotion identification model employing SVM, LDA, KNN, and RF classifiers. The researchers developed a smart virtual therapist that uses physiological signals (EEG, ECG, and EDA) to identify human emotions and offers support, advice, and speech characteristics tailored to the situation.

Depression is a severe mental health condition with significant economic and societal implications. Many EEG-based biomarkers have been developed after numerous studies investigated the use of emotion recognition systems in depression diagnosis. de Aguiar Neto and Rosa (2019) wrote an overview of depression biomarkers based on EEG analysis. A physiological dataset was created by Cai et al. (2018) with 213 people (92 of whom had depression and 121 were healthy). During both resting condition and sound stimulation, EEG data were captured. They used KNN, DT, SVM, and NN classifiers and could identify depression with an accuracy of up to 79%. Additionally, by assisting in the perception and expression of feelings, emotion recognition systems can potentially improve the quality of life for people with a range of genetic illnesses, including autism.

Autism spectrum disorders (ASD) can be diagnosed using additional factors besides mood issues. However, clinicians have traditionally used patients' emotional behaviours as the foundation for autism. Numerous studies have demonstrated that emotion classification based on EEG data processing can considerably enhance the ability of people with neurological disorders like acute Alzheimer's to integrate socially (Gonzalez et al. 2019).

7.2 Non-Medical Applications


EEGs have been used in many fields other than medicine. Both physically fit people and people who have physical limitations use these applications. EEGs have been used in non-medical industries such entertainment, education, gaming, and monitoring. The table below lists the application categories for non-medical goods.

Education: Students in education wore portable EEG devices with emotion recognition to track their emotions while receiving remote teaching. According to Elatlassi (‏2018) real-time biometric assessments of acuity, performance, and motivation can model student engagement in online contexts. Real-time biometrics like EEG and eye-tracking replicate acuity, performance, and motivation. Biometrics were tested in an online learning environment. Shen et al. (2009) employed SVM on PPG, EDA and EEG signals to detect confusion, boredom, hopefulness, and engagement as common during learning engagement with 86% accuracy. The emotion-aware e-learning system was compared to a baseline programme. Based on the learner's mood, their experiment prototype offered solutions. It was shown that the emotion-aware e-learning system reduced inputs and improved effectiveness. Zhou et al. (2022a, b) conducted an in-depth analysis of a machine learning approach to recognising cognitive workload from EEG data. EEG-based cognitive workload recognition has been used in many fields including Education, air traffic control, and autism treatments etc. (Aricò et al. 2016; L. Zhang et al. 2017a, b). Traditional machine learning techniques are still used alongside those of deep learning.

Automotive environment: EEG-based emotion detection improves autonomous car autopilot accuracy by adding an emotion identification system (Park et al. 2018). Brain-computer interface technology can directly detect and send the passenger's emotions to the driverless system, allowing it to adjust its driving mode. Gwak et al. (2018) used physiological, behavioural, and driving performance variables to quantify drivers' alert states. To establish the association between driver arousal, physiological indicators like EEG and ECG, behavioural assessments, and driving performance, driving simulator (DS) and driver monitoring system data were reviewed. From 10 s of data, machine learning differentiates awake and drowsy states using 32 features. KNN, LR, SVM, and DT classified driver fatigue and random Forest discriminated awake from moderately drowsy phases with 81.4% accuracy. Many accidents, especially traffic accidents, are caused by fatigue. This includes EEG-based driver fatigue estimate, which is less dependent on obvious behaviour and less deceptive (Mühl et al. 2014). EEG-based driver fatigue estimation can identify fatigue sooner than facial emotions since it originates deep in the brain. EEG-based driver tiredness estimate uses many methods, including transfer learning (Cui et al. 2019; Wei et al. 2018; Wu et al. 2017).

Gaming: We can create a game assistant system for emotional feedback control based on physiological and EEG signals in entertainment study, giving players a complete sense of engagement and incredibly participatory experiences. Emotion-enabled apps, such as emotion-based music therapy, were developed and implemented in the study of Sourina et al. (2011) along with EEG-based “serious” games for focus training.

8 Open issues

Current ML and DL algorithms can be used with BCI devices to record EEG data and analyse them for practical applications of emotion detection technologies. But there are still problems there that must be remedied. Although recognition accuracy varies greatly depending on the application and is largely dependent on the datasets utilised in the study, previous techniques to emotion detection using EEG signals showed classification accuracies more than 80% on average. The evaluation revealed the following as some of the unresolved problems and directions for further study in the area of emotion recognition.

  1. 1.

    The subjective dependent emotion recognition issue, which needs a customised classifier for each person, is the focus of current research. Real-world situations would greatly benefit from an emotion detection model that is subject independent and appropriate for a variety of people. As per the studies by Du et al. (2018), Sangineto et al. (2014), transfer learning techniques must be incorporated with the subject-independent classifier model in order to attain emotion detection results that is constant across people.

  2. 2.

    Most of the existing EEG datasets were gathered in laboratories using visual induction methods. The emotional state of the volunteers prior to the experimentation was not considered in earlier studies. These individual variations can lead to inconsistency in datasets.

  3. 3.

    In several investigations, each emotional dimension was solely considered in binary form.

  4. 4.

    At the moment, the discrete model and continuous model make up the majority of the theoretical foundation for emotion recognition. Although they are connected to one another, no one theoretical framework has been developed for them.

  5. 5.

    In the Internet era, protecting user privacy with regard to their personal information is a crucial moral and ethical concern. Users' private information includes the EEG and other physiological data gathered in emotional computing, therefore privacy protection should be taken seriously.

  6. 6.

    Employing unobtrusive gadgets, such as smart bands, watches, or straps that can be worn without much difficulty, will help one design an emotion recognition system that is suited for everyday use.

  7. 7.

    In most studies on emotion recognition, researchers ignored the baseline (spontaneous) EEG data in favour of examining EEG data under various emotional states.

  8. 8.

    The literature did not mention EEG-based emotion recognition of mixed emotions, such as unpleasant sentiments, which incorporate both positive and negative impact felt simultaneously. These conflicting feelings are intriguing since they are related to study on how to enhance creative ability

9 Future trends and research directions

The following will be taken into account in addition to the above-mentioned considerations in future development.

  1. 1.

    A predetermined subjective rating data threshold has historically been used to label actual emotion classes. It is very challenging to choose the right threshold. The simultaneous consideration of the valence and arousal dimensions, followed by the use of data clustering techniques to identify the real classes of emotions, is a novel approach.

  2. 2.

    Components of EEG-based BCI systems, like feature extraction and selection, are constantly changing. They should be founded on a solid knowledge of biology and physiology of the brain.

  3. 3.

    It is necessary to create emotional models dealing with a range of greater aspects. The two-dimensional emotion model is commonly used right now. Higher-dimensional emotion models must be created in order to recognise multiple classes of emotions.

  4. 4.

    It is important to continue researching the relationship between explicit information in emotional computing—such as discrete emotions—and implicit information—such as the signal properties of various frequency bands of EEG signals corresponding to those discrete emotions. Understanding how they relate to one another is crucial for comprehending the various emotional states indicated by EEG signals.

  5. 5.

    The majority of publicly accessible datasets for affective computing involve images, videos, music, and other external means to elicit emotional responses for EEG-based emotion recognition. These emotional changes are passive, as opposed to the active emotional changes that people make in actual scenarios, which may cause variances in their EEG patterns. Therefore, it is worthwhile to research how to distinguish between internal active emotional change and externally caused emotional change.

  6. 6.

    It is necessary to create improved machine learning methods, such as deep learning and compact machine learning. Emotions are a product of subjective and challenging cognitive processes. Therefore, it is challenging to provide a recognition approach that is only based on traditional ML techniques.

  7. 7.

    Conventional time series analysis methods must be combined with machine learning strategies in order to track temporal emotional variations in a timely manner.

  8. 8.

    Most engineering methods for identifying emotions demonstrate that arousal categorization is typically more precise than that of valence. The explanation for this would be that although valence level requires a factor analysis of ANS reactions that are cross-associated, changes in arousal level are directly related to ANS activities. As a result, we must propose a framework for categorising emotions specifically and extract a range of valence-relevant traits from EEG data across several analysis domains.

  9. 9.

    Since video games are faster at simulating “real-life” events and better at evoking emotion, we need to build additional datasets that use active elicitation approaches.

10 Challenges

One of the most essential concepts in emotion classification is the capacity to recognise emotions based on physiological information. EEG signals, writing, voice, and facial expressions are the various ways by which emotions can be recognised. For psychologists, and researchers the issue has grown since EEG readings are internal brain impulses that a person cannot control. Finding the optimal techniques for reliably classifying emotions based on EEG data or selecting an appropriate classifier is one of the major challenges. The primary problem for this is the dataset’s small sample sizes. For a model to be designed that can generalise successfully on new or previously unexplored data, it must first be trained and validated on a large number of subjects. Applying data augmentation techniques effectively helps address the issue of a small dataset as shown in the study (Luo et al. 2020). Another issue is noise in the signal and in low-frequency regions, where it is exceedingly difficult to eliminate noise from the signal through raw data processing. The data filtering process has made use of a number of approaches, including FIR filters, Adaptive Filters, Bandpass Filters, etc. In order to acquire high-quality noise free EEG signals, it is essential to develop hardware acquisition equipment; and conduct effective pre-processing (noise reduction and artefact removal) techniques. Before acquiring an EEG signal, we should warn the individuals not to blink or perform any other actions that could introduce artifacts into the EEG acquisition. Then, as stated in Sect. 3.2, a variety of pre-processing approaches can be applied for some unavoidable artefacts. Reducing the size of the input features, handling data easily, and irregular EEG performance present further challenges. It is crucial to choose features for emotion detection tasks that have a significant capacity to describe emotional state. There are many algorithms that can eliminate unnecessary or irrelevant features to reduce the number of dimensions.

11 Conclusion

The significance of emotion recognition in the field of HCI has increased as technology and human interface technologies advance. EEG-based BCI emotion recognition has drawn plenty of attention in the field of emotional computing in recent times. Due to considerable advancements in the development of accessible and affordable BCI devices, various research investigations have been conducted. For this review, we looked over at 145 publications. EEG signals are trustworthy data that can't be simulated or manipulated. EEG responds immediately to changes in emotion. Since several emotions can engage the same brain regions or, inversely, a single emotion might activate several structures, affective states cannot be simply mapped to particular brain structures.

We gave a general overview of BCI and its many approaches and applications. In order to characterise the output of emotional analysis, we divided the current emotion models focused on psychological concepts into discrete as well as dimensional models. Additionally, many methods for evoking emotions were outlined. We also provided an overview of the human brain, the EEG signal, and its connection to emotions. Modern methods for identifying EEG emotions that have been developed recently were discussed. In addition, standard databases for training models for either DL or ML-based affective understanding are needed for the development of affective computing. We took into account the already-available datasets and discussed some, which are primarily utilised for emotion recognition, we discussed the key elements of the EEG-based emotion detection pipeline. The general technique for EEG-based BCI emotion recognition consists of the following steps: data collection, pre-processing, feature extraction, feature selection, classification, and performance evaluation. We looked at a variety of EEG signals acquisition systems with electrode distribution during the data collection stage. We also gave a quick overview of the several methods for normalising EEG readings. Several pre-processing methods, such as filtering, artefact removal, base line correction, and data augmentation, were the subject of our review. Different feature extraction and classification strategies have been explained with a comparison system from various angles for the emotional feature extraction and classification of EEG research. According to our study, there doesn't appear to be a single feature extraction or classification strategy that stands out as the top option for all applications when it comes to computational methods that can be utilised in these phases. The decision is based on the particular system paradigms and task. It has been advised to take into account as many algorithms as feasible, including pre-processing and synchronisation, to assess the viability of the suggested method. Most of the time, before making a decision that produces appropriate performance for the specific application, one should make comparisons with a variety of features and methodologies. Academics, researchers, and professionals working in the area of emotion recognition and detection can find reliable reference materials in our descriptive review. Our comprehension of numerous application principles for emotion recognition utilising BCI in various contexts is also aided by this study. This overview lists the uses of EEG-based emotion recognition in both medical and non-medical fields. Our findings indicated a sharp increase in journal papers relating to EEG-based emotion recognition. This demonstrates a rise in research interest in EEG-based emotion detection as a credible and salient field of study. This expansion was sparked by elements including the widespread use of wireless EEG equipment, sophisticated computational intelligence methods, and machine learning. The quantity of EEG-based emotion detection research is something we anticipate expanding exponentially in the near future.

As this review demonstrates, it is challenging to investigate the connection between brain signals and emotions, and new approaches and applications are continually being created. It is anticipated that many of the outstanding problems and difficulties encountered while conducting this research will be overcome shortly, opening the door for a broad range of possible applications based on EEG-based emotion recognition. It is hoped that this review would give researchers, a better understanding of where things stand in terms of identifying and classifying emotional-oriented EEG signals.