Applied Intelligence

, Volume 39, Issue 4, pp 675–691 | Cite as

Detecting changing emotions in human speech by machine and humans

Article

Abstract

The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker’s emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one’s own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user’s emotional state in the envisioned application.

Keywords

Affective computing Vocal expression Emotion recognition Speech features Random forests 

References

  1. 1.
    Batliner A, Steidle S, Schuller B, Seppi D, Vogt T, Wagner J, Vidrascu L, Aharonson V, Kessous L, Amir N (2010) Whodunnit—searching for the most important speech feature types signalling emotion-related user states in speech. Comput Speech Lang. doi:10.1016/j.csl.2009.12.003 Google Scholar
  2. 2.
    Breiman L (2001) Random forests. Mach Learn 45:5–32 CrossRefMATHGoogle Scholar
  3. 3.
    Breazeal C, Brooks R (2005) Robot emotion: a functional perspective. In: Fellous J-M, Arbib MA (eds) Who needs emotions? Oxford University Press, New York Google Scholar
  4. 4.
    Castellano G, Kessous G, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction. Lecture notes in computer science, vol 4868. Springer, Berlin, pp 92–103 CrossRefGoogle Scholar
  5. 5.
    Duda RO, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York Google Scholar
  6. 6.
    Ekman P (1992) An argument for basic emotions. Cogn Emot 6:169–200 CrossRefGoogle Scholar
  7. 7.
    Fredrickson BL, Mancuso R, Branigan C, Tugade M (2000) The undoing effect of positive emotions. Motiv Emot 24:237–258 CrossRefGoogle Scholar
  8. 8.
    Frijda NH (2007) The laws of emotion. Lawrence Erlbaum Associates Publishers, Hillsdale Google Scholar
  9. 9.
    GAQ (2002) Geneva appraisal questionnaire. See: http://www.affective-sciences.org/system/files/page/2636/GAQ_English.PDF
  10. 10.
    Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer, New York Google Scholar
  11. 11.
    Kurematsu M, Amanuma S, Hakura J, Fujita H (2008) An extraction of emotion in human speech using cluster analysis and a regression tree. In: Fujita H, Sasaki J (eds) Proceedings of the 10th WSEAS international conference on applied computer science. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, pp 346–350 Google Scholar
  12. 12.
    Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K (2011) Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang 25:84–104 CrossRefGoogle Scholar
  13. 13.
    Li X, Tao J, Johnson M, Soltis J, Savage A, Leong K, Newman J (2007) Stress and emotion classification using jitter and shimmer features. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2007), pp 1081–1084 Google Scholar
  14. 14.
    van der Maaten LJP, Postma E, van der Herik H (2009) Dimensionality reduction: a comparative review. Tilburg University technical report, TiCC-TR 2009-005 Google Scholar
  15. 15.
    McIntyre G, Göcke R (2007) Towards affective sensing. In: Jacko JA (ed) Proc of the 12th international conference on human-computer interaction: intelligent multimodal interaction environments, part III (HCI’07). Lecture notes in computer science, vol 4552. Springer, Berlin, pp 411–420 CrossRefGoogle Scholar
  16. 16.
    Russel JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178 CrossRefGoogle Scholar
  17. 17.
    Schölkopf B, Smola AJ (2001) Learning with kernels. support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge Google Scholar
  18. 18.
    Tawari A, Trivedi M (2010) Speech based emotion classification framework for driver assistance system. In: Intelligent vehicles symposium (IV), 21–24 June 2010 IEEE Press, New York, pp 174–178. doi:10.1109/IVS.2010.5547956 Google Scholar
  19. 19.
    Vogt T, André E, Wagner J (2007) Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Jacko JA (ed) Proc of the 12th international conference on human-computer interaction: intelligent multimodal interaction environments, part III (HCI’07). Lecture notes in computer science, vol 4552. Springer, Berlin, pp 75–91 Google Scholar
  20. 20.
    Yik M, Russel J, Steiger J (2011) A 12-point circumplex structure of core affect. Emotion 11(4):705–731 CrossRefGoogle Scholar
  21. 21.
    Zhang C, Wu J, Xiao X, Wang Z (2006) Pronunciation variation modeling for Mandarin with accent. In: Proceedings of ICSLP’06, Pittsburgh, USA, pp 709–712 Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Artificial IntelligenceVU University AmsterdamHV AmsterdamThe Netherlands
  2. 2.Leiden Institute of Advanced Computer ScienceLeiden UniversityCA LeidenThe Netherlands

Personalised recommendations