Detecting changing emotions in human speech by machine and humans


The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker’s emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one’s own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user’s emotional state in the envisioned application.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Batliner A, Steidle S, Schuller B, Seppi D, Vogt T, Wagner J, Vidrascu L, Aharonson V, Kessous L, Amir N (2010) Whodunnit—searching for the most important speech feature types signalling emotion-related user states in speech. Comput Speech Lang. doi:10.1016/j.csl.2009.12.003

    Google Scholar 

  2. 2.

    Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  3. 3.

    Breazeal C, Brooks R (2005) Robot emotion: a functional perspective. In: Fellous J-M, Arbib MA (eds) Who needs emotions? Oxford University Press, New York

    Google Scholar 

  4. 4.

    Castellano G, Kessous G, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction. Lecture notes in computer science, vol 4868. Springer, Berlin, pp 92–103

    Google Scholar 

  5. 5.

    Duda RO, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  6. 6.

    Ekman P (1992) An argument for basic emotions. Cogn Emot 6:169–200

    Article  Google Scholar 

  7. 7.

    Fredrickson BL, Mancuso R, Branigan C, Tugade M (2000) The undoing effect of positive emotions. Motiv Emot 24:237–258

    Article  Google Scholar 

  8. 8.

    Frijda NH (2007) The laws of emotion. Lawrence Erlbaum Associates Publishers, Hillsdale

    Google Scholar 

  9. 9.

    GAQ (2002) Geneva appraisal questionnaire. See:

  10. 10.

    Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer, New York

    Google Scholar 

  11. 11.

    Kurematsu M, Amanuma S, Hakura J, Fujita H (2008) An extraction of emotion in human speech using cluster analysis and a regression tree. In: Fujita H, Sasaki J (eds) Proceedings of the 10th WSEAS international conference on applied computer science. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, pp 346–350

    Google Scholar 

  12. 12.

    Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K (2011) Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang 25:84–104

    Article  Google Scholar 

  13. 13.

    Li X, Tao J, Johnson M, Soltis J, Savage A, Leong K, Newman J (2007) Stress and emotion classification using jitter and shimmer features. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2007), pp 1081–1084

    Google Scholar 

  14. 14.

    van der Maaten LJP, Postma E, van der Herik H (2009) Dimensionality reduction: a comparative review. Tilburg University technical report, TiCC-TR 2009-005

  15. 15.

    McIntyre G, Göcke R (2007) Towards affective sensing. In: Jacko JA (ed) Proc of the 12th international conference on human-computer interaction: intelligent multimodal interaction environments, part III (HCI’07). Lecture notes in computer science, vol 4552. Springer, Berlin, pp 411–420

    Google Scholar 

  16. 16.

    Russel JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178

    Article  Google Scholar 

  17. 17.

    Schölkopf B, Smola AJ (2001) Learning with kernels. support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  18. 18.

    Tawari A, Trivedi M (2010) Speech based emotion classification framework for driver assistance system. In: Intelligent vehicles symposium (IV), 21–24 June 2010 IEEE Press, New York, pp 174–178. doi:10.1109/IVS.2010.5547956

    Google Scholar 

  19. 19.

    Vogt T, André E, Wagner J (2007) Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Jacko JA (ed) Proc of the 12th international conference on human-computer interaction: intelligent multimodal interaction environments, part III (HCI’07). Lecture notes in computer science, vol 4552. Springer, Berlin, pp 75–91

    Google Scholar 

  20. 20.

    Yik M, Russel J, Steiger J (2011) A 12-point circumplex structure of core affect. Emotion 11(4):705–731

    Article  Google Scholar 

  21. 21.

    Zhang C, Wu J, Xiao X, Wang Z (2006) Pronunciation variation modeling for Mandarin with accent. In: Proceedings of ICSLP’06, Pittsburgh, USA, pp 709–712

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to C. Natalie van der Wal.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

van der Wal, C.N., Kowalczyk, W. Detecting changing emotions in human speech by machine and humans. Appl Intell 39, 675–691 (2013).

Download citation


  • Affective computing
  • Vocal expression
  • Emotion recognition
  • Speech features
  • Random forests