Emotion and Disposition Detection in Medical Machines: Chances and Challenges

  • Kim Hartmann
  • Ingo Siegert
  • Dmytro Prylipko
Part of the Intelligent Systems, Control and Automation: Science and Engineering book series (ISCA, volume 74)


Machines designed for medical applications beyond usual data acquisition and processing need to cooperate with and adapt to humans in order to fulfill their supportive tasks. Technically, medical machines are therefore considered as affective systems, capable of detecting, assessing and adapting to emotional states and dispositional changes in users. One of the upcoming applications of affective systems is the use as supportive machines involved in the psychiatric disorder diagnose and therapy process. These machines have the additional requirement of being capable to control persuasive dialogues in order to obtain relevant patient data despite disadvantageous set-ups. These automated abilities of technical systems combined with enhanced processing, storage and observational capabilities raise both chances and challenges in medical applications. We focus on analyzing the objectivity, reliability and validity of current techniques used to determine the emotional states of speakers from speech and the arising implications. We discuss the underlying technical and psychological models and analyze recent machine assessment results of emotional states obtained through dialogues. Conclusively we discuss the involvement of affective systems as medical machines in the psychiatric diagnostics process and therapy sessions with respect to the technical and ethical circumstances.


  1. 1.
    Altman DG (1991) Practical statistics for medical research. Chapman & Hall, LondonGoogle Scholar
  2. 2.
    Ayadi ME, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 3(44):572–587CrossRefGoogle Scholar
  3. 3.
    Batliner A, Hacker C, Steidl S, Nöth E, Russell M, Wong M (2004) “You stupid tin box”-children interacting with the AIBO robot: a cross-linguistic emotional speech corpus. Proc. of LREC. LREC, Lisbon, Portugal, pp 865–868Google Scholar
  4. 4.
    Becker P (2001) Structural and relational analyses of emotions and personality traits. Zeitschrift für Differentielle und Diagnostische Psychologie 3(22):155–172CrossRefGoogle Scholar
  5. 5.
    Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
  6. 6.
    Bishop M (2004) Introduction to computer security. Addison-Wesley Professional, USAGoogle Scholar
  7. 7.
    Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psy 25:49–59CrossRefGoogle Scholar
  8. 8.
    Burger S, MacLaren V, Yu H (2002) The ISL meeting corpus: the impact of meeting type on speech style. ICSLP, Colorado, pp. 301-304Google Scholar
  9. 9.
    Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. Proc. of Interspeech. ISCA, Portugal, pp 1517–1520Google Scholar
  10. 10.
    Callejas Z, López-Cózar R (2008) Influence of contextual information in emotion annotation for spoken dialogue systems. Speech Commun 50:416–433CrossRefGoogle Scholar
  11. 11.
    Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Cognitive behavioural systems. Springer, Berlin Heidelberg, pp 144–157Google Scholar
  12. 12.
    Chao L (2013) Cloud database development and management. Auerbach Publications, USACrossRefGoogle Scholar
  13. 13.
    Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. Proceedings of ISCA tutorial and research workshop (ITRW) on speech and emotion. ISCA, France, pp 19–24Google Scholar
  14. 14.
    Davidson R (1994) On emotion, mood, and related affective constructs. In: Ekman P (ed) The nature of emotion: fundamental questions. Oxford University Press, Oxford, pp 51–56Google Scholar
  15. 15.
    Dellaert F, Polzin T, Waibel A (1996) Recognizing emotions in speech. Proc. ICSLP 1996. ICSLP/ISCA, PhiladelphiaGoogle Scholar
  16. 16.
    Devillers L, Vasilescu I (2004) Reliability of lexical and prosodic cues in two real-life spoken dialog corpora. Proceedings of LREC. European Language Resources Association, LisbonGoogle Scholar
  17. 17.
    Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw 4(18):407–422CrossRefGoogle Scholar
  18. 18.
    Ekman P (1992) Are there basic emotions? Psychol Rev 99:550–553CrossRefGoogle Scholar
  19. 19.
    Engberg IS, Hansen AV (1996) Documentation of the Danish emotional speech database (DES). Aalborg University, AalborgGoogle Scholar
  20. 20.
    Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159CrossRefzbMATHGoogle Scholar
  21. 21.
    Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, USACrossRefzbMATHGoogle Scholar
  22. 22.
    Fragopanagos NF, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw pp 389–405Google Scholar
  23. 23.
    Gehm T, Scherer KR (1988) Factors determining the dimensions of subjective emotional space. In: Scherer KR (ed) Facets of emotion. Lawrence Erlbaum Associates, USA, pp 99–113Google Scholar
  24. 24.
    Gratch J, Morency L-P, Scherer S, Stratou G, Boberg J, Koenig S, et al (2013) User-state sensing for virtual health agents and telehealth applications. Medicine meets virtual reality 20—NextMed, MMVR. IOS Press, Shanghai, pp 151–157Google Scholar
  25. 25.
    Grimm M, Kroschel K (2005) Evaluation of natural emotions using self assessment manikins. IEEE workshop on automatic speech recognition and understanding. IEEE, San Juan, pp 381–385Google Scholar
  26. 26.
    Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. Proceedings of ICME. ICME, Monterry, pp 865–868Google Scholar
  27. 27.
    Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1:77–89Google Scholar
  28. 28.
    Hübner D, Vlasenko B, Grosser T, Wendemuth A (2010) Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm. Proceedings of Interspeech. ISCA, Makuhari, pp 2358–2361Google Scholar
  29. 29.
    Ibáñez J (2011) Showing emotions through movement and symmetry. Comput Hum Behav 1(27):561–567Google Scholar
  30. 30.
    Iliou T, Anagnostopoulos C-N (2009) Comparison of different classifiers for emotion recognition. Proceedings of the 13th panhellenic conference on informatics. IEEE Computer Society, Los Alamitos, pp 102–106Google Scholar
  31. 31.
    Kane J, Scherer S, Aylett M, Morency L-P, Gobl C (2013) Speaker and language independent voice quality classification applied to unlabelled corpora of expressive speech. Proceedings of international conference on acoustics, speech, and signal processing (ICASSP). IEEE, Vancouver, pp 7982–7986Google Scholar
  32. 32.
    Krippendorff K (2012) Content analysis: an introduction to its methodology, 3rd edn. SAGE Publications, Thousand OaksGoogle Scholar
  33. 33.
    Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics (33):159–174Google Scholar
  34. 34.
    Lang PJ (1980) Behavioral treatment and bio-behavioral assessment: computer applications. In: Sidowski JB, Johnson JH, Williams TA (eds) Technology in mental health care delivery systems. Ablex Pub Corp, New York, pp 119–137Google Scholar
  35. 35.
    Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 2(13):293–303Google Scholar
  36. 36.
    Lee CM, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z et al (2004) Emotion recognition based on phoneme classes. Proceedings of Interspeech 2004. ICSLIP, Jeju IslandGoogle Scholar
  37. 37.
    Lee C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: ISCA (ed) Proceedings of interspeech’2009. IEEE, Brighton, pp 320–323Google Scholar
  38. 38.
    Lefter I, Rothkrantz LJ, Burghouts GJ (2012) Aggression detection in speech using sensor and semantic information. In: Sojka P, Horak A, Kopecek I, Pala K (eds) Text, speech and dialogue, vol LNCS 7499. Springer, Berlin Heidelberg, pp 665–672Google Scholar
  39. 39.
    Lugger M, Yang B (2007) An incremental analysis of different feature groups in speaker independent emotion recognition. Proceedings of the 16th international congress of phonetic sciences. ICPhS, Saarbrücken, pp 2149–2152Google Scholar
  40. 40.
    McCree RR, John OP (1992) An introduction to the five-factor model and its applications. J Pers 2(60):175–215CrossRefGoogle Scholar
  41. 41.
    McDougall W (1908) An introduction to social psychology [Dover edition (2003)]. Dover Publications Inc, LondonGoogle Scholar
  42. 42.
    McKeown G, Valstar M, Cowie R, Pantic M (2010) The SEMAINE corpus of emotionally coloured character interactions. Proceedings of ICME. ICME, Singapore, pp 1079–1084Google Scholar
  43. 43.
    McKeown G, Valstar M, Cowie R, Pantic M, Schröder M (2012) The SEMAINE database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3:5–17CrossRefGoogle Scholar
  44. 44.
    Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 4(14):261–292CrossRefGoogle Scholar
  45. 45.
    Morris JD (1995) SAM: the self-assessment manikin an efficient cross-cultural measurement of emotional response. J Advertising Res 35:63–68Google Scholar
  46. 46.
    Morris JD, McMullen JS (1994) Measuring multiple emotional responses to a single television commercial. Adv Consum Res 21:175–180CrossRefGoogle Scholar
  47. 47.
    Morris WN (1989) Mood: the frame of mind. Springer, New YorkCrossRefGoogle Scholar
  48. 48.
    Palm G, Glodek M (2013) Towards emotion recognition in human computer interaction. In: Apolloni B, Bassis SE, Morabito FC (eds) Smart innovation, systems and technologies. Neural nets and surroundings, vol 19. Springer, Heidelberg, pp 323–336Google Scholar
  49. 49.
    Plutchik R (1980) Emotion, a psychoevolutionary synthesis. Harper & Row, New YorkGoogle Scholar
  50. 50.
    Prylipko D, Rösner D, Siegert I, Günther S, Friesen R, Haase M, Vlasenko B, Wendemuth A (2014) Analysis of significant dialog events in realistic human-computer interaction. J Multimodal User Interfaces 8(1):75–86CrossRefGoogle Scholar
  51. 51.
    Russel J (1980) Three dimensions of emotion. J Pers Soc Psychol 9(39):1161–1178CrossRefGoogle Scholar
  52. 52.
    Russel J, Mehrabian A (1974) Distinguishing anger and anxiety in terms of emotional response factors. J Consult Clin Psych 42:79–83CrossRefGoogle Scholar
  53. 53.
    Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res in Pers 273–294Google Scholar
  54. 54.
    Sacharin V, Schlegel K, Scherer KR (2012) Geneva emotion wheel rating study. Center for Person, Kommunikation, Aalborg University, NCCR Affective Sciences. Aalborg University, AalborgGoogle Scholar
  55. 55.
    Scherer KR (2001) Appraisal considered as a process of multilevel sequential checking. In: Scherer KR, Schorr A, Johnstone T (eds) Appraisal processes in emotion: theory, methods, research. Oxford University Press, Oxford, pp 92–120Google Scholar
  56. 56.
    Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inform 4(44):695–729CrossRefGoogle Scholar
  57. 57.
    Scherer KR, Dan E, Flykt A (2006) What determines a feeling’s position in affective space? A case for appraisal. Cogn Emot 1(20):92–113CrossRefGoogle Scholar
  58. 58.
    Scherer S, Schels M, Palm G (2011) How low level observations can help to reveal the user’s state in HCI In: D’Mello S, Graesser A, Schuller B, Martin J-C (eds) Proceedings of the 4th international conference on affective computing and intelligent interaction (ACII’11). Springer, Memphis, pp 81–90Google Scholar
  59. 59.
    Scherer S, Stratou G, Mahmoud M, Boberg J, Gratch J, Rizzo A et al (2013) Automatic behavior descriptors for psychological disorder analysis. IEEE conference on automatic face and gesture recognition. IEEE, ShanghaiGoogle Scholar
  60. 60.
    Schlosberg H (1954) Three dimensions of emotion. Psychol Rev 2(61):81–88CrossRefGoogle Scholar
  61. 61.
    Schuller B, Rigoll G, LangM(2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. Proceedings of IEEE international conference on acoustic, signal, and speech processing (ICASSP’2004). IEEE, Montreal, pp 577–580Google Scholar
  62. 62.
    Schuller B, Steidl S, Batliner A (2009) The INTERSPEECH 2009 emotion challenge. Proceedings of INTERSPEECH’2009. ISCA, Brighton, pp 312–315Google Scholar
  63. 63.
    Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller CA, et al (2010) The INTERSPEECH 2010 paralinguistic challenge. Proceedings of INTERSPEECH’2010. ISCA, Makuhari, pp 2794–2797Google Scholar
  64. 64.
    Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, et al (2013) The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. Proceedings of INTERSPEECH’2013. ISCA, LyonGoogle Scholar
  65. 65.
    Schuller B, Steidl S, Batlinger A, Schiel F, Krajewski J (2011) The INTERSPEECH 2011 Speaker State Challenge. Proceedings of INTERSPEECH’2011. ISCA, Florence, pp 3201–3204Google Scholar
  66. 66.
    Siegert I, Böck R, Wendemuth A (2014) Inter-rater reliability for emotion annotation in human-computer interaction: comparison and methodological improvements. J Multimodal User Interfaces 8(1):17–28CrossRefGoogle Scholar
  67. 67.
    Siegert I, Hartmann K, Glüge S, Wendemuth A (2013) Modelling of emotional development within human-computer-interaction. Kognitive SystemeGoogle Scholar
  68. 68.
    Steidl S (2009) Automatic classification of emotion related user states in spontaneous children’s speech. University of Erlangen-NurembergGoogle Scholar
  69. 69.
    Truong KP, Neerincx MA, van Leeuwen DA (2008) Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. Proceedings of INTERSPEECH’2008. ISCA, Brisbane, pp 318–321Google Scholar
  70. 70.
    Truong KP, van Leeuwen DA, de Jong FM (2012) Speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun 9(54):1049–1063Google Scholar
  71. 71.
    Ververidis D, Kotropoulos C (2004) Automatic speech classification to five emotional states based on gender information. Proceedings of the 12th European signal processing conference (EUSIPCO’2004). EUSIPCO’2004, Austria, pp 341–344Google Scholar
  72. 72.
    Vlasenko B, Schuller B, Wendemuth A, Rigoll G (2007) Combining frame and turn-level information for robust recognition of emotions within speech. Proceedings of INTERSPEECH’2007. ISCA, Antwerp, pp 2249–2252Google Scholar
  73. 73.
    Vlasenko B, Schuller B, Wendemuth A, Rigoll G (2007) Frame versus turn-level: emotion recognition from speech considering static and dynamic processing. In Paiva A, Prada R, Picard RW (eds) Affective computing and intelligent interaction, vol LNCS 4738. Springer, Berlin Heidelberg, pp 139–147Google Scholar
  74. 74.
    Wundt WM (1922/1863) Vorlesungen über die Menschen- und Tierseele. L.Voss, LeipzigGoogle Scholar
  75. 75.
    Yang Y-H, Lin Y-C, Su Y-F, Chen H (2007) Music emotion classification: a regression approach. Proceedings of IEEE international conference on multimedia and expo (ICME’2007). IEEE, Beijing, pp 208–211Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Cognitive Systems GroupOtto von Guericke University MagdeburgMagdeburgGermany

Personalised recommendations