Emotion recognition for semi-autonomous vehicles framework

  • Javier Izquierdo-Reyes
  • Ricardo A. Ramirez-Mendoza
  • Martin R. Bustamante-Bello
  • Jose L. Pons-Rovira
  • Jose E. Gonzalez-Vargas
Technical Paper


The human being in his blessed curiosity has always wondered how to make machines feel, and, at the same time how a machine can detect emotions. Perhaps some of the tasks that cannot be replaced by machines are the ability of human beings to feel emotions. In the last year, this hypothesis is increasingly questioned by scientists who have done work that seeks to understand the phenomena of brain functioning using the state of the art in instrumentation, sensors, and signal processing. Today, the world scientists have powerful machine learning methods developed to challenge this issue.The field of emotion detection is gaining significance as the technology advances, and particularly due to the current developments in machine learning, the Internet of Things, industry 4.0 and Autonomous Vehicles. Machines will need to be equipped with the capacity to monitor the state of the human user and to change their behaviour in response. Machine learning offers a route to this and should be able to make use of data collected from questionnaires, facial expression scans, and physiological signals such as electroencephalograms (EEG), electrocardiograms, and galvanic skin response. In this study, an approach was proposed to identify the emotional state of a subject from the collected data in the elicited emotion experiments. An algorithm using EEG data was developed, using the power spectral density of the frequency cerebral bands (alpha, beta, theta, and gamma) as features for classifier training. A K Nearest Neighbors algorithm using Euclidian distance was used to predict the emotional state of the subject. This article proposes a novel approach for emotion recognition that not only depends on images of the face, as in the previous literature, but also on the physiological data. The algorithm was able to recognize nine different emotions (Neutral, Anger, Disgust, Fear, Joy, Sadness, Surprise, Amusement, and Anxiety), nine valence positions, and nine positions on arousal axes. Using the data from only 14 EEG electrodes, an accuracy of approximately 97% was achieved. An approach has been developed for evaluating the state of mind of an driver in the context of a semi-autonomous vehicle context, for example. However, the system has a much wider range of potential applications, from the design of products to the evaluation of the user experience.


Electroencephalography Emotion recognition K nearest neighbor Autonomous vehicles Semi autonomous 

1 Introduction

Over the past years, the study of engineering approaches to the sensing of emotions has gained importance, as machine learning has started to give way to affective computing. The goal of the discipline is to discover new ways of managing interactions between machines and humans. A key area is the creation of novel environments that the user finds more natural [12]. Another area concerns the evaluation of the user experience. Here the tracking of emotional responses is allowing detailed matching of customer preferences, among other things. The goal is to create a direct connexion to the human experience (Fig. 1).
Fig. 1

Graphical abstract

According to our perspective, the interactive design must be a process where the user is involved in the design process. It is with the purpose of including his/her emotions and requirements in a final product. As well, as interactive engineering, we understand that process where engineering process interacts with the user in the case of present approach, the use of mathematical approximation to read the cerebral bands and add an objective value to the user emotions.

Among the proposed approaches to identifying emotions, valence, and/or arousal, the most widely used has been the capture of facial expressions using cameras. Few such systems are already commercially available, including SHORETM,1 and iMotionsTM.2 These systems use facial action units to assign a probability to the emotional state of the user.

Physiological measures have included electroencephalograms (EEG), electrocardiograms (ECG), and galvanic skin response (GSR). However, their high computational costs and the complexity of signal acquisition limit their range of applications. Studies that have applied these techniques have been mainly confined to closely controlled environments. In this study, we investigated a novel approach to the use of EEG data in a Self-Assessment Manikin (SAM) method. Frequency and power density data were used to cluster and identify the emotional state of a subject when exposed to a particular stimulus.

The algorithm was designed to be universally applicable. Only the acquisition system had to be recalibrated to the individual user to obtain a baseline, avoiding the need for complete retraining.

At the outset of the study, two databases were identified that record the physiological signals associated with elicited emotions: the Database for Emotion Analysis using Physiological Signals (DEAP) [8] and the MAHNOB-HCI Tagging Database [15]. While both had been compiled using multiple subjects from different parts of the world, no Latin-American participants appeared in the register. Therefore, to ensure the universality of the algorithm, the database should be extended to include Mexican subjects.

Our approach achieved a detection rate of approximately 97%, which is a significant increase in the best previously-reported result of Chen [2], which achieved a maximum detection rate of 76.17%.

The rest of this paper is organized as follows. Section 2 discusses the current state-of-the-art in emotion recognition using EEG data. Section 3 introduces the approach used in this study. Experiments in on-line implementation were conducted, and these are reported in Sect. 4. Finally, Sect. 5 presents our conclusions and suggestions for future work.

2 State of the art

One of the preferred methods of emotion recognition uses facial expressions. Garbas [4] presented an approach to detecting valence from the facial image acquired by a web-cam. Thermal imaging has also been used [19] and [20]. Kolli [9] reported the successful use of this technology in noisy environments, e.g. driving.

The health and emotional state of subjects are related to the cardiac activity via the autonomous nervous system, suggesting that ECGs can play an important role in emotion recognition. Xu et al. [22] developed an improved binary particle swarm optimisation system for the identification of joy and sadness. Their system was based on the P-QRS-T wave and achieved an average accuracy of approximately 88%.

In a proposal similar to ours, Selvaraj [14] developed an ECG-based algorithm that was able to recognize six basic emotional states when audiovisual stimuli were applied. The method used nonlinear features (Hurst) and Bayesian, Regression tree, KNN, and Fussy KNN classifiers, achieving a maximum accuracy of 92.87% in random validation and 76.45% in subject dependent validation. The ECG-based algorithm proposed by Tivatansakul [17] used local binary patterns (accuracy 84.17%) and local ternary patterns (accuracy 87.92%) to only recognize the negative emotions.

Galvanic skin response has been less widely investigated but was demonstrated by Wu et al. to be capable of determining the emotional state of a subject [21]. In this study, 30 features were extracted in the time and frequency domain from GSR, and six different emotional states were identified, with a maximum accuracy of approximately 74 and 86% in training.

Conversely, EEG has been one of the most used methods. This is logical as all cognitive and control processes occur in the brain. Chen et al. [2] used the DEAP dataset to recognize the emotions via trough connectivity between EEG channels, achieving accuracies of 76.17% for valence and 73.59% for the detection of arousal. The main contribution of that study was the use of different algorithms based on Pearson correlation, phase coherence, and mutual information to identify correlations in the EEG data. The gamma frequency band was reported to be the most effective feature for identifying the valence level.

Chang [1] applied the global synchronization index (GFS) to arousal, and reported that the GFS beta band decreased in the states of strong emotional arousal. This finding was confirmed in Kumar’s 2016 study [10], which established a negative correlation between the increases in the theta and alpha bands, decreases in the beta band, and the level of arousal. Chang used a professional clinical EEG, whereas Kumar used an Emotiv EpocTM device, yet the two studies produced comparable results. This suggested the following:
  1. 1.

    Non-clinical devices can be used in emotion recognition, and

  2. 2.

    the frequency bands are more robust against noise than time domain signals.

Whereas most studies have used a single physiological variable, the emotion recognition system of Torres [18] was multi-modal. The system combined EEG with GSR and, by selecting the most appropriate features, achieved an average accuracy of 79.25% for arousal and 75.41% for valence.

The use of more complex algorithms, such as those of Jirayucharoensak [7], have not been shown to produce superior results.

In the case of vehicle design, many approaches have been published, for example, Petiot presents in [13] a model to evaluate the design of vehicle parts by using the subjective opinin of the final user, the result is an statistical procurement to evaluate the design of a product including the user’s requirements and preferences.

On the other hand, Cheutet [3] presents an environment for knowledge-based modeling for the conceptual design phase of an automobile, compared with the method presented in this work, the output of the proposed algorithm can act as the inputs for the methods presented in said works.

Based on this literature review, this study used frequency bands as features, following the approach of [7]. In contrast with that study, however, a KNN classifier was used as the final classifier.

3 Methodology

At the outset of the study, two databases were identified in which physiological signals are associated with elicited emotions: the Database for Emotion Analysis using Physiological Signals (DEAP) [8] and the MAHNOB-HCI Tagging Database [15]. Both databases record the register of the EEG for each stimulus, and time-synchronize this with the other physiological variables and with videos of the facial expression.

The characteristics and format of the files made the HCI Tagging Database more suitable for our purposes, and this was used as the main data source. This database was compiled by the Intelligent Behaviour Understanding Group of Imperial College, London. It comprises a register of physiological variables and videos taken from 30 participants. Stimuli were systematically presented to induce emotions. The variables were as follows:
  1. 1.

    32 EEG channels,

  2. 2.

    3 ECG derivations,

  3. 3.

    1 GSR channel,

  4. 4.

    Respiration belt, and,

  5. 5.


Data were recorded using a Biosemi Active II device and stored in BDF format. In addition, six cameras recorded the face of the subject from different positions to capture facial responses and eye tracking data. Detailed information of the experimental setup can be found at the HCI Database webpage.3

3.1 Signal processing

Figure 2 outlines the experimental procedures. After the data had been downloaded, the 32 EEG channels were disaggregated from the BDF file of each participant, in both the pre-stimulus and stimulus periods. The analysis was made more complex by an overlap between the signals from the final part of the pre-stimulus period and those from the start of the stimulus exposure. A specific routine was introduced to identify the overlapping signals, and these were assigned to the stimulus exposure period.
Fig. 2

Proposed methodology

The main objective was to assess the effectiveness of the minimally invasive Emotiv EpocTM head set in detecting emotional states across a range of environments. In this study, therefore, only the distribution data from the EmotivTM electrodes were used. Based on the 10-20 international classification system, these were electrodes AF3, AF4, F7, F3, F4, F8, FC5, FC6, T7, T8, P3, P4, P7, P8, and the O1 and O2 channels. The original data were sampled at 256 Hz and were down sampled to 128 Hz to homologate with the desired platform.

We then applied a method adapted from the work of Suwicha et al. [7]. The signals were filtered to cut the 4 and 40 Hz frequencies. A high pass and a low pass Butterworth 6th order filters were then applied.

The Welch algorithm was applied to estimate the power spectral density (PSD) of each EEG channel, using a Hanning window of 128 samples. Estimation was conducted separately for the pre-stimulus and stimulus periods, and the mean PSD of the pre-stimulus was subtracted from that of the stimulus for each channel.

The use of the Welch algorithm to estimate the PSD was reported in [16] as follows:
  1. 1.
    Partition of data sequence:
    $$\begin{aligned} x[0], x[1],\ldots , x[N-1] \end{aligned}$$
    in L segments or batches.
    Segment 1:

    \(x[0], x[1],\ldots , x[M - 1]\)

    Segment 2:

    \(x[S], x[S + 1],\ldots , x[M + S - 1]\) . . .

    Segment L:

    \(x[N - M], x[N - M + 1],\ldots , x[N - 1]\)

  2. 2.
    For each segment (\(l = 1\) to L) a windowed discrete Fourier transformation (DFT) is computed at some frequency: \(\nu =i/M{ with}-(M/2-1)\le i \le M/2\):
    $$\begin{aligned} X_l(\nu ) = \sum _m x[m]\omega [m]e^{-j2\pi \nu m} \end{aligned}$$

    \(m = (l-1)S,\ldots ,M + (l-1)S-1\) \(\omega [m]=\) the Hanning window in our case.

  3. 3.
    For each segment (l=1 to L) form the modified periodogram value \(P_l(f)\) from the Fourier Transform:
    $$\begin{aligned} P_l(\nu ) = \frac{1}{W}|X_l(\nu )|^2 \end{aligned}$$
    $$\begin{aligned} W= \sum _{m = 0}^M \omega ^2[m] \end{aligned}$$
  4. 4.
    Average the periodogram values to obtain Welch’s estimate of the PSD
    $$\begin{aligned} S_x(\nu ) = \frac{1}{l} \sum _{l = 1}^l P_l(\nu ) \end{aligned}$$

    \(M = \) Length of each segment (batch size) and also the length of the DFT.

    \(S = \) Number of points to shift between segments.

    \(L = \) Number of segments or batches.

This algorithm mainly uses overlapping to estimate the PSD. This is calculated based on M?S, as follows: \(100[(M-s)/M]\%\) Our approach used the brain wave theta (4–8 Hz), low alpha (8–10 Hz), high alpha (10–12 Hz), beta (12–30 Hz), and gamma (30–40 Hz) features. The differences between the pairs of channels were calculated, as well as their waves in the same frequency bands.

The total sets of features were: \(Features = 5(\)brain waves\()*(14\) Ch\( + 7(\)difference of pairs\()) = 105\) rows. After band separation, all brain waves had vectors of different lengths. Re-sampling was therefore applied to square the feature matrix

In this matrix, answers were labelled for use in model training, based on the responses of participants in the Self-Assessment Manikin section of the database. This is a valid instrument for measuring a subject’s valence, arousal, dominance, and emotional response when exposed to stimuli, and was used as the ground truth in our system.

Before model fitting, the feature matrix was randomized and divided into 70% for training and 30% for testing.

3.2 Fitting model

The most important part of the approach is the model. This was initially generated using a stacked auto-encoder, following Jirayucharoensak et al. [7]. While it achieved an accuracy of approximately 60%, a trial using the algorithm proposed by Xu [22] obtained better classification results from the EEG data), compared to that obtained in both the procedures.

The final algorithm used was a K-Nearest Neighbor (KNN) classifier based on the Euclidean distance. Although this is one of the simplest machine learning classifiers, the results were significantly better than most of those reported for the state-of-the-art classifiers. This algorithm creates clusters and then determines the class of one point by combining the classification of the K nearest points. It is a supervised algorithm in the sense that it requires the class of points to be established in training and because the labels are treated as categorical targets, despite having numerical values.

The KNN classifier takes into account the voting of the K neighbours of a single point (\(X_i\)). This can be seen in Fig. 3, where the dot \(X_i\) takes the Euclidian length of its K neighbours and is classified into the nearest neighbourhood.

However, differences in the magnitude of features mean that better results can be obtained if the data are standardized. This is done as follows:
$$\begin{aligned} z_{ij} = \frac{x_{ij} - \mu _j}{\sigma _j} \end{aligned}$$
Here, \(x_{ij}\) is the value of the ith sample and jth feature, \(\mu _j\) is the average of all \(x_{ij}\) for feature j, and \(\sigma _j\) is the standard deviation of all \(x_{ij}\) over all input samples.
The Euclidian distance is calculated and used as the familiarity weight with the nearest point, and is derived as follows:
$$\begin{aligned} D = \sqrt{\sum _{i = 1}^k{(z_i-y_i)^2}} \end{aligned}$$
In this study, the KNN model applied nine classes for each category (Emotion, Valence, Arousal). Three different KNN models were trained, using the same feature matrix.
Fig. 3

KNN algorithm

3.3 Model evaluation

After the KNN model was trained, the error was calculated using the test data set. This comprised the same feature matrix and corresponding labels as those used in the training process but was shorter in length. The accuracy was calculated by comparing the mean of true predictions with the test labels.

The best results obtained while using k = 1 for the nearest points and with a standardized Euclidian distance, were as follows:
  • Test Emotion: 96.341%

  • Test Valence: 96.341%

  • Test Arousal: 96.222%

Table 1

Accuracy using different k





















Fig. 4

Proposed ADMAS block diagram [6]

This demonstrated that the algorithm was able to predict the SAM score of a user in response to a specific stimulus, using only the EEG data.

4 Analysis of results

To avoid the effect of noise in the data, the best option is theoretically to use \(k > 2\). However, application of this rule has an effect on the overall accuracy. In a real-time process, it is necessary to use a different value of K to maintain satisfactory performance across the whole system. Table 1 demonstrates the performance of the algorithm at different values of K, when the training feature matrix was held constant.

As can be seen, the performance of the algorithm was affected by the value of K. However, because of the number of classes being used, even the worst performance retained an acceptable level of accuracy.

As the main goal was to test the on-line use of the EmotivTM headset, the algorithm was trained using data acquired by this system, while preserving the method of feature generation.

5 Conclusions and future work

The method proposed in this study was able to predict nine emotions, nine valences, and nine arousal classes. As reported in [5], the analysis of the power of the cerebral bands allows noise to be disaggregated, while inducing minimum effects to the system.

In desk tests, the algorithm achieved a maximum accuracy rate greater than 97%, representing an improvement in the accuracy rates reported in the literature.

However, to obtain accurate predictions, the classifier must be trained on the data obtained using the same (Emotiv EpocTM) platform.

Emotion elicitation experiments based on the Mahnob-HCI Tagging Database [11] should be conducted using Mexican participants for the training of the KNN classifier.

In the near future, we plan to use this algorithm for driving monitoring. This was proposed in [6], with the Neurological Variables through the EEG system included in the full system. As seen from Fig. 4, the human variables are within a single dataset but remain independent of each other.

This can be implemented as a parallel system in health monitoring of people with communication, movement, or mental disorders. Knowledge of their emotional states should facilitate interaction with them. The algorithm can also be used in brain-computer control interfaces, or in neuro-marketing to allow better objective evaluation of a product.

5.1 A perspective for the future

In future developments, algorithms of this kind may be used to establish connections between human users and devices. The Internet of Things is rapidly developing, and biometric information may become readily available. This will make it easier to determine the health status or the emotional state of a person. It will allow lifestyle improvements by meeting personal preferences in a more customized way.

Industry 4.0 provides another potential application, as product development becomes connected to the designer, to the factory, to the store, and to the final user. In future, products may be unconsciously designed by users, to fully satisfy their requirements.




This research was supported by Tecnologico de Monterrey and Consejo Nacional de Ciencia y Tecnologia (CONACYT) Mexico, under scholarship 593255, We give special thanks to the Instituto Cajal for present and future collaboration.


  1. 1.
    Im, C.-H., Lee, J.-H., Lim, J.-H.: Neurocinematics based on passive BCI: decoding temporal change of emotional arousal during video watching from multi-channel EEG. In: 2015 10th Asian Control Conference (ASCC), pp. 1–3. IEEE (2015).
  2. 2.
    Chen, M., Han, J., Guo, L., Wang, J., Patras, I.: Identifying valence and arousal levels via connectivity between EEG channels. In: 2015 International Conference on Affective Computing and Intelligent Interaction, ACII 2015, pp. 63–69 (2015).
  3. 3.
    Cheutet, V., Léon, J.C., Catalano, C.E., Giannini, F., Monti, M., Falcidieno, B.: Preserving car stylists design intent through an ontology. Int. J. Interact. Des. Manuf. (IJIDeM) 2(1), 9–16 (2008).
  4. 4.
    Garbas, J.U., Ruf, T., Mattias, U., Dieckmann, A.: Towards robust real-time valence recognition from facial expressions for market research applications. In: Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 570–575 (2013).
  5. 5.
    Izquierdo-Reyes, J., Ramirez-Mendoza, R.A., Bustamante-Bello, M.R.: A study of the effects of advanced driver assistance systems alerts on driver performance. Int. J. Interact. Des. Manuf. (IJIDeM) (2017).
  6. 6.
    Izquierdo-Reyes, J., Ramirez-Mendoza, R.A., Bustamante-Bello, M.R., Navarro-Tuch, S., Avila-Vazquez, R.: Advanced driver monitoring for assistance system (ADMAS). Int. J. Interact. Des. Manuf. (IJIDeM) (2016).
  7. 7.
    Jirayucharoensak, S., Pan-Ngum, S., Israsena, P.: EEG-based emotion recognition using deep learning network with principal component based covariate shift adaptation. Sci World J 2014, 1–10 (2014).
  8. 8.
    Koelstra, S., Muhl, C., Soleymani, M.: Jong-Seok Lee, Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I.: DEAP: A Database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012).
  9. 9.
    Kolli, A., Fasih, A., Machot, F.A., Kyamakya, K.: Non-intrusive car driver s emotion recognition using thermal camera. In: 2011 Joint 3rd Int’l Workshop on Nonlinear Dynamics and Synchronization (INDS) & 16th Int’l Symposium on Theoretical Electrical Engineering (ISTET) (2011)Google Scholar
  10. 10.
    Kumar, J., Kumar, J.: Affective modelling of users in HCI using EEG. Procedia Comput. Sci. 84, 107–114 (2016).
  11. 11.
    Lichtenauer, J., Soleymani, M.: Mahnob-Hci-Tagging Database. Tech. rep., London (2011).
  12. 12.
    Navarro-Tuch, S.A., Bustamante-Bello, M.R., Molina, A., Izquierdo-Reyes, J., Avila-Vazquez, R., Pablos-Hach, J.L., Gutiérrez-Martínez, Y.: Inhabitable space control for the creation of healthy interactive spaces through emotional domotics. Int. J. Interact. Des. Manuf. (IJIDeM) (2017).
  13. 13.
    Petiot, J.F., Dagher, A.: Preference-oriented form design: application to cars headlights. Int. J. Interact. Des. Manuf. (IJIDeM) 5(1), 17–27 (2011).
  14. 14.
    Selvaraj, J., Murugappan, M., Wan, K., Yaacob, S.: Classification of emotional states from electrocardiogram signals: a non-linear approach based on Hurst. Biomed. Eng. Online 12(1), 44 (2013).
  15. 15.
    Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A Multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012).
  16. 16.
    Solomon, O.M.: PSD Computations Using Welchs Method. Tech. Rep. December, Sandia National Laboratories (1991).
  17. 17.
    Tivatansakul, S., Ohkura, M.: Emotion recognition using ECG signals with local pattern description methods. Int. J. Affect. Eng. 15(2), 51–61 (2016).
  18. 18.
    Torres-Valencia, C., Álvarez-López, M., Orozco-Gutiérrez, l: SVM-based feature selection methods for emotion recognition from multimodal data. J. Multimodal User Interfaces 11(1), 9–23 (2017). CrossRefGoogle Scholar
  19. 19.
    Wang, S., Liu, Z., Lv, S., Lv, Y., Wu, G., Peng, P., Chen, F., Wang, X.: A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimed. 12(7), 682–691 (2010). CrossRefGoogle Scholar
  20. 20.
    Wang, S., Shen, P., Liu, Z.: Facial expression recognition from infrared thermal images using temperature difference by voting. In: Proceedings of IEEE CCIS2012, pp. 94–98 (2012)Google Scholar
  21. 21.
    Wu, G., Liu, G., Hao, M.: The analysis of emotion recognition from GSR based on PSO. In: Proceedings—2010 International Symposium on Intelligence Information Processing and Trusted Computing, IPTC 2010, pp. 360–363 (2010).
  22. 22.
    Xu, Y., Liu, G., Hao, M., Wen, W., Huang, X.: Analysis of affective ECG signals toward emotion recognition. J. Electron. (China) 27(1), 8–14 (2010).

Copyright information

© Springer-Verlag France SAS, part of Springer Nature 2018

Authors and Affiliations

  • Javier Izquierdo-Reyes
    • 1
    • 2
  • Ricardo A. Ramirez-Mendoza
    • 1
  • Martin R. Bustamante-Bello
    • 1
  • Jose L. Pons-Rovira
    • 2
  • Jose E. Gonzalez-Vargas
    • 2
  1. 1.School of Engineering and SciencesTecnologico de MonterreyMexico CityMexico
  2. 2.Cajal Institute Spanish National Research CouncilMadridSpain

Personalised recommendations