International Journal of Social Robotics

, Volume 7, Issue 4, pp 451–463 | Cite as

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

  • Laurence DevillersEmail author
  • Marie Tahon
  • Mohamed A. Sehili
  • Agnes Delaborde


The challenge of this study is twofold: recognizing emotions from audio signals in naturalistic Human–Robot Interaction (HRI) environment, and using a cross-dataset recognition for robustness evaluation. The originality of this work lies in the use of six emotional models in parallel, generated using two training corpora and three acoustic feature sets. The models are obtained from two databases collected in different tasks, and a third independent real-life HRI corpus (collected within the ROMEO project— is used for test. As primary results, for the task of four-emotion recognition, and by combining the probabilistic outputs of six different systems in a very simplistic way, we obtained better results compared to the best baseline system. Moreover, to investigate the potential of fusing many systems’ outputs using a “perfect” fusion method, we calculate the oracle performance (oracle considers a correct prediction if at least one of the systems outputs a correct prediction). The obtained oracle score is 73 % while the auto-coherence score on the same corpus (i.e. performance obtained by using the same data for training and for testing) is about 57 %. We experiment a reliability estimation protocol that makes use of outputs from many systems. Such reliability measurement of an emotion recognition system’s decision could help to construct a relevant emotional and interactional user profile which could be used to drive the expressive behavior of the robot.


Human–robot interaction Emotion recognition Prediction reliability Real-life data 



This work was partially funded by the French projects FUI ROMEO and BPI ROMEO2. The authors thank coders and co-workers who participated in elaborating protocols and annotating emotional states.


  1. 1.
    Alonso-Martin F, Malfaz M, Sequeira J, Gorostiza J, Salichs M (2013) A multimodal emotion detection system during human-robot interaction. Sensors 13:15549–15581CrossRefGoogle Scholar
  2. 2.
    Batliner A, Hacker C, Steidl S, Neth E, D’Arcy S, Russell M, Wong M (2004) “You stupid tin box”—children interacting with the aibo robot: a cross-linguistic emotional speech corpus. In: LREC, Lisbon, pp 171–174Google Scholar
  3. 3.
    Batliner A, Schuller B, Seppi D, Steidl S, Devillers L, Vidrascu L, Vogt T, Aharonson V, Amir N (2011) Cognitive technologies. In: The automatic recognition of emotions in speech. Springer, Heidelberg, pp 71–99Google Scholar
  4. 4.
    Batliner A, Steidl S, Neth E (2007) Laryngealizations and emotions: how many babushkas? In: Proceedings of the international workshop on paralinguistic speech—between models and data (ParaLing’ 07), Saarbrucken, pp 17–22Google Scholar
  5. 5.
    Benziger T, Scherer KR (2005) The role of intonation in emotional expressions. Speech Commun 46(3–4):252–267CrossRefGoogle Scholar
  6. 6.
    Brendel M, Zaccarelli R, Devillers L (2010) Building a system for emotions detection from speech to control an affective avatar. In: LREC, Valetta, MaltaGoogle Scholar
  7. 7.
    Buendia A, Devillers L (2014) From informative cooperative dialogues to long-term social relation with a robot. In: Mariani J, Rosset S, Garnier-Rizet M, Devillers L (eds) Natural interaction with robots, knowbots and smartphones. Springer, New York, pp 135–151CrossRefGoogle Scholar
  8. 8.
    Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, WeissI B (2005) A database of german emotional speech. In: Interspeech, Lisbon, pp 1517–1520Google Scholar
  9. 9.
    Castellano G, Leite I, Pereira A, Martinho C, Paiva A, McOwan P (2010) Inter-act: an affective and contextually rich multimodal video corpus for studying interaction with robots. In: International ACM conference on multimediaGoogle Scholar
  10. 10.
    Chastagnol C, Clavel C, Courgeon M, Devillers L (2013) Designing an emotion detection system for a socially-intelligent human-robot interaction. In: Jokinen K, Wilcock G (eds) Towards a natural interaction with robots, knowbots and smartphones, putting spoken dialog systems into practice. Springer, New YorkGoogle Scholar
  11. 11.
    Cordeschi R (2013) Automatic decision-making and reliability in robotic systems: some implications in the case of robot weapons. AI Soc 28:431–441CrossRefGoogle Scholar
  12. 12.
    Dautenhahn K, Werry I (2002) A quantitative technique for analyzing robot-human interactions. In: International conference on intelligent robots and systems, LausanneGoogle Scholar
  13. 13.
    de Visser E, Parasuraman R (2011) Adaptive aiding of human-robot teaming effects of imperfect automation on performance, trust, and workload. J Cognit Eng Decis Mak 5(2):209–231CrossRefGoogle Scholar
  14. 14.
    Delaborde A, Devillers L (2010) Use of nonverbal speech cues in social interaction between human and robot: emotional and interactional markers. In: International Workshop on affective interaction in natural environements (AFFINE), FirenzeGoogle Scholar
  15. 15.
    Desai M, Medvedev M, Vázquez M, McSheehy S, Gadea-Omelchenko S, Bruggeman C, Yanco H (2012) Effects of changing reliability on trust of robot systems. In: ACM/IEEE international conference on human-robot interaction, pp 73–80Google Scholar
  16. 16.
    Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine leraning based detection. J Neural Netw Spec Issue Emot Brain 18(4):407–422Google Scholar
  17. 17.
    Devillers L, Martin JC (2008) Coding emotional events in audiovisual corpora. In: LREC, MarrakechGoogle Scholar
  18. 18.
    Devillers L, Vidrascu L, Layachi O (2010) A blueprint for an affectively competent agent, cross-fertilization between emotion psychology, affective neuroscience, and affective computing. In: Automatic detection of emotion from vocal expression. Oxford University Press, OxfordGoogle Scholar
  19. 19.
    Duhaut D (2012) A way to put empathy in a robot. In: ICAI’10, Las VegasGoogle Scholar
  20. 20.
    Ekman P (1999) Handbook of cognition and emotion, Wiley, New York, chap Basic emotionGoogle Scholar
  21. 21.
    Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. Eurospeech, RhodesGoogle Scholar
  22. 22.
    Eyben F, Batliner A, Schuller B, Seppi D, Steidl S (2010) Cross-corpus classification of realistic emotions: some pilot experiments. In: LREC, workshop on EMOTION: corpora for research on emotion and Affect, ELRA, Valetta, pp 77–82Google Scholar
  23. 23.
    Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159CrossRefzbMATHGoogle Scholar
  24. 24.
    Han JG, Gilmartin E, Looze CD, Vaughan B, Campbell N (2012) Speech & multimodal resources: the herme database of spontaneous multimodal human-robot dialogues. In: LREC, IstanbulGoogle Scholar
  25. 25.
    Hegel F, Gieselmann S, Peters A, Holthaus P, Wrede B (2011) Towards a typology of meaningful signals and cues in social robotics. In: IEEE RO-MAN, 2011Google Scholar
  26. 26.
    Jung M, Lee J, DePalma N, Adalgeirsson S, Hinds P, Breazeal C (2013) Engaging robots: easing complex human-robot teamwork using backchanneling. In: Conference on computer supported cooperative work, San AntonioGoogle Scholar
  27. 27.
    Keizer S, Foster M, Lemon O, Gaschler A, Giuliani M (2013) Training and evaluation of an mdp model for social multi-user human-robot interaction. In: SIGDIALGoogle Scholar
  28. 28.
    Marchi E, Batliner A, Schuller B (2012) Speech, emotion, age, language, task and typicality: trying to disentangle performance and future relevance. In: Workshop on wide spectrum social signal processing (ASE/IEEE international conference on social computing), AmsterdamGoogle Scholar
  29. 29.
    McKeown G, Valstar M, Cowie R, Pantic M, Schröder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17CrossRefGoogle Scholar
  30. 30.
    Mower E, Metallinou A, Lee CC, Kazemzadeh A, Busso C, Lee S, Narayanan S (2009) Interpreting ambiguous emotional expressions. In: ACII vol 978(1), Amsterdam, pp 4244–4799Google Scholar
  31. 31.
    Ochs M, Sadek D, Pelachaud C (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agents Multi-Agent Syst 24(3):410–440CrossRefGoogle Scholar
  32. 32.
    Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (ed) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Google Scholar
  33. 33.
    Ringeval F, Chetouani M, Schuller B (2012) Novel metrics of speech rhythm for the assessment of emotion. In: Proceedings of the interspeechGoogle Scholar
  34. 34.
    Scherer KR (1986) Vocal affect expressions: a review and a model for future research. Psychol Bull 99(2):143–165MathSciNetCrossRefGoogle Scholar
  35. 35.
    Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2010) The interspeech 2010 paralinguistic challenge. In: Interspeech. Makuhari, pp 2830–2833Google Scholar
  36. 36.
    Schuller B, Vlasenko B, Eyben F, Wöllmer M, Stühlsatz A, Wendemuth A, Rigoll G (2010b) Cross-corpus acoustic emotion recognition: variances and strategies. Trans Affect Comput IEEE 1(2):119–131CrossRefGoogle Scholar
  37. 37.
    Schuller B, Batliner A, Steidl S, Seppi D (2011a) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun, Special Issue on (Sensing Emotion and Affect-Facing Realism in Speech Processing) 53 (9/10):1062–1087Google Scholar
  38. 38.
    Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Interspeech, Brighton,Google Scholar
  39. 39.
    Schuller B, Steidl S, Batliner A, Nöth E, Vinciarelli A, Burkhardt F, van Son R, Weninger F, Eyben F, Bocklet T, Mohammadi G, Weiss B (2012) The interspeech 2012 speaker trait challenge. In: Interspeech, PortlandGoogle Scholar
  40. 40.
    Schuller B, Steidl S, Batliner A, Schiel F, Krajewski J (2011b) The interspeech 2011 speaker state challenge. In: Interspeech, FirenzeGoogle Scholar
  41. 41.
    Schuller B, Zaccarelli R, Rollet N, Devillers L (2010c) Cinemo–a french spoken language resource for complex emotions: facts and baselines. In: LREC, ValettaGoogle Scholar
  42. 42.
    Schuller B, Zhang Z, Weninger F, Rigoll G (2011c) Using multiple databases for training emotion recognition: to unite or to vote ? In: Interspeech, FlorenceGoogle Scholar
  43. 43.
    Sehili M, Yang F, Leynaert V, Devillers L (2014) A corpus of social interaction between nao and elderly people. In: 5th international workshop on emotion, social signals, sentiment & linked open data (ES3LOD2014), LRECGoogle Scholar
  44. 44.
    Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics for human-robot interaction. In: HRI’06, Salt Lake CityGoogle Scholar
  45. 45.
    Sun R, Moore EI (2013) Using rover for multiple databases training at the decision level for binary emotional recognition. In: ICASSPGoogle Scholar
  46. 46.
    Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: Interspeech, FirenzeGoogle Scholar
  47. 47.
    Walker M, Litman D, Kamm C, Abella A (1997) Paradise: a framework for evaluating spoken dialogue agents. In: EACL ’97, MadridGoogle Scholar
  48. 48.
    Yagoda RE, Gillian DJ (2012) You want me to trust a robot? the development of a huma-robot interaction trust scale. Int J Soc Robot 4:235–248CrossRefGoogle Scholar
  49. 49.
    Zhang Z, Weninger F, Wöllmer M, Schuller B (2011) Unsupervised learning in cross-corpus acoustic emotion recognition. In: ASRU, HonoluluGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.LIMSI-CNRSOrsayFrance
  2. 2.Université Paris-Sorbonne IVParisFrance

Personalised recommendations