Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Devillers, Laurence; Tahon, Marie; Sehili, Mohamed A.; Delaborde, Agnes

doi:10.1007/s12369-015-0297-8

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Published: 10 April 2015

Volume 7, pages 451–463, (2015)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

Laurence Devillers ORCID: orcid.org/0000-0001-9894-172X^1,2,
Marie Tahon¹,
Mohamed A. Sehili¹ &
…
Agnes Delaborde¹

1011 Accesses
27 Citations
Explore all metrics

Abstract

The challenge of this study is twofold: recognizing emotions from audio signals in naturalistic Human–Robot Interaction (HRI) environment, and using a cross-dataset recognition for robustness evaluation. The originality of this work lies in the use of six emotional models in parallel, generated using two training corpora and three acoustic feature sets. The models are obtained from two databases collected in different tasks, and a third independent real-life HRI corpus (collected within the ROMEO project—http://www.projetromeo.com/) is used for test. As primary results, for the task of four-emotion recognition, and by combining the probabilistic outputs of six different systems in a very simplistic way, we obtained better results compared to the best baseline system. Moreover, to investigate the potential of fusing many systems’ outputs using a “perfect” fusion method, we calculate the oracle performance (oracle considers a correct prediction if at least one of the systems outputs a correct prediction). The obtained oracle score is 73 % while the auto-coherence score on the same corpus (i.e. performance obtained by using the same data for training and for testing) is about 57 %. We experiment a reliability estimation protocol that makes use of outputs from many systems. Such reliability measurement of an emotion recognition system’s decision could help to construct a relevant emotional and interactional user profile which could be used to drive the expressive behavior of the robot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Speech-Based Human Robot Interaction with Emotion Recognition

Multimodal Database of Emotional Speech, Video and Gestures

Emotion Recognition from Speech

Notes

Many efforts are made to release at least one of these corpora to the community, which will require specific data formatting and obtaining the agreement of all the participants.
Vision Institute, 11 rue Moreau, 75012 Paris.

References

Alonso-Martin F, Malfaz M, Sequeira J, Gorostiza J, Salichs M (2013) A multimodal emotion detection system during human-robot interaction. Sensors 13:15549–15581
Article Google Scholar
Batliner A, Hacker C, Steidl S, Neth E, D’Arcy S, Russell M, Wong M (2004) “You stupid tin box”—children interacting with the aibo robot: a cross-linguistic emotional speech corpus. In: LREC, Lisbon, pp 171–174
Batliner A, Schuller B, Seppi D, Steidl S, Devillers L, Vidrascu L, Vogt T, Aharonson V, Amir N (2011) Cognitive technologies. In: The automatic recognition of emotions in speech. Springer, Heidelberg, pp 71–99
Batliner A, Steidl S, Neth E (2007) Laryngealizations and emotions: how many babushkas? In: Proceedings of the international workshop on paralinguistic speech—between models and data (ParaLing’ 07), Saarbrucken, pp 17–22
Benziger T, Scherer KR (2005) The role of intonation in emotional expressions. Speech Commun 46(3–4):252–267
Article Google Scholar
Brendel M, Zaccarelli R, Devillers L (2010) Building a system for emotions detection from speech to control an affective avatar. In: LREC, Valetta, Malta
Buendia A, Devillers L (2014) From informative cooperative dialogues to long-term social relation with a robot. In: Mariani J, Rosset S, Garnier-Rizet M, Devillers L (eds) Natural interaction with robots, knowbots and smartphones. Springer, New York, pp 135–151
Chapter Google Scholar
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, WeissI B (2005) A database of german emotional speech. In: Interspeech, Lisbon, pp 1517–1520
Castellano G, Leite I, Pereira A, Martinho C, Paiva A, McOwan P (2010) Inter-act: an affective and contextually rich multimodal video corpus for studying interaction with robots. In: International ACM conference on multimedia
Chastagnol C, Clavel C, Courgeon M, Devillers L (2013) Designing an emotion detection system for a socially-intelligent human-robot interaction. In: Jokinen K, Wilcock G (eds) Towards a natural interaction with robots, knowbots and smartphones, putting spoken dialog systems into practice. Springer, New York
Google Scholar
Cordeschi R (2013) Automatic decision-making and reliability in robotic systems: some implications in the case of robot weapons. AI Soc 28:431–441
Article Google Scholar
Dautenhahn K, Werry I (2002) A quantitative technique for analyzing robot-human interactions. In: International conference on intelligent robots and systems, Lausanne
de Visser E, Parasuraman R (2011) Adaptive aiding of human-robot teaming effects of imperfect automation on performance, trust, and workload. J Cognit Eng Decis Mak 5(2):209–231
Article Google Scholar
Delaborde A, Devillers L (2010) Use of nonverbal speech cues in social interaction between human and robot: emotional and interactional markers. In: International Workshop on affective interaction in natural environements (AFFINE), Firenze
Desai M, Medvedev M, Vázquez M, McSheehy S, Gadea-Omelchenko S, Bruggeman C, Yanco H (2012) Effects of changing reliability on trust of robot systems. In: ACM/IEEE international conference on human-robot interaction, pp 73–80
Devillers L, Vidrascu L, Lamel L (2005) Challenges in real-life emotion annotation and machine leraning based detection. J Neural Netw Spec Issue Emot Brain 18(4):407–422
Google Scholar
Devillers L, Martin JC (2008) Coding emotional events in audiovisual corpora. In: LREC, Marrakech
Devillers L, Vidrascu L, Layachi O (2010) A blueprint for an affectively competent agent, cross-fertilization between emotion psychology, affective neuroscience, and affective computing. In: Automatic detection of emotion from vocal expression. Oxford University Press, Oxford
Duhaut D (2012) A way to put empathy in a robot. In: ICAI’10, Las Vegas
Ekman P (1999) Handbook of cognition and emotion, Wiley, New York, chap Basic emotion
Engberg IS, Hansen AV, Andersen O, Dalsgaard P (1997) Design, recording and verification of a danish emotional speech database. Eurospeech, Rhodes
Eyben F, Batliner A, Schuller B, Seppi D, Steidl S (2010) Cross-corpus classification of realistic emotions: some pilot experiments. In: LREC, workshop on EMOTION: corpora for research on emotion and Affect, ELRA, Valetta, pp 77–82
Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Commun 40:145–159
Article MATH Google Scholar
Han JG, Gilmartin E, Looze CD, Vaughan B, Campbell N (2012) Speech & multimodal resources: the herme database of spontaneous multimodal human-robot dialogues. In: LREC, Istanbul
Hegel F, Gieselmann S, Peters A, Holthaus P, Wrede B (2011) Towards a typology of meaningful signals and cues in social robotics. In: IEEE RO-MAN, 2011
Jung M, Lee J, DePalma N, Adalgeirsson S, Hinds P, Breazeal C (2013) Engaging robots: easing complex human-robot teamwork using backchanneling. In: Conference on computer supported cooperative work, San Antonio
Keizer S, Foster M, Lemon O, Gaschler A, Giuliani M (2013) Training and evaluation of an mdp model for social multi-user human-robot interaction. In: SIGDIAL
Marchi E, Batliner A, Schuller B (2012) Speech, emotion, age, language, task and typicality: trying to disentangle performance and future relevance. In: Workshop on wide spectrum social signal processing (ASE/IEEE international conference on social computing), Amsterdam
McKeown G, Valstar M, Cowie R, Pantic M, Schröder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Article Google Scholar
Mower E, Metallinou A, Lee CC, Kazemzadeh A, Busso C, Lee S, Narayanan S (2009) Interpreting ambiguous emotional expressions. In: ACII vol 978(1), Amsterdam, pp 4244–4799
Ochs M, Sadek D, Pelachaud C (2012) A formal model of emotions for an empathic rational dialog agent. Auton Agents Multi-Agent Syst 24(3):410–440
Article Google Scholar
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (ed) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74
Google Scholar
Ringeval F, Chetouani M, Schuller B (2012) Novel metrics of speech rhythm for the assessment of emotion. In: Proceedings of the interspeech
Scherer KR (1986) Vocal affect expressions: a review and a model for future research. Psychol Bull 99(2):143–165
Article MathSciNet Google Scholar
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, Narayanan S (2010) The interspeech 2010 paralinguistic challenge. In: Interspeech. Makuhari, pp 2830–2833
Schuller B, Vlasenko B, Eyben F, Wöllmer M, Stühlsatz A, Wendemuth A, Rigoll G (2010b) Cross-corpus acoustic emotion recognition: variances and strategies. Trans Affect Comput IEEE 1(2):119–131
Article Google Scholar
Schuller B, Batliner A, Steidl S, Seppi D (2011a) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun, Special Issue on (Sensing Emotion and Affect-Facing Realism in Speech Processing) 53 (9/10):1062–1087
Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Interspeech, Brighton,
Schuller B, Steidl S, Batliner A, Nöth E, Vinciarelli A, Burkhardt F, van Son R, Weninger F, Eyben F, Bocklet T, Mohammadi G, Weiss B (2012) The interspeech 2012 speaker trait challenge. In: Interspeech, Portland
Schuller B, Steidl S, Batliner A, Schiel F, Krajewski J (2011b) The interspeech 2011 speaker state challenge. In: Interspeech, Firenze
Schuller B, Zaccarelli R, Rollet N, Devillers L (2010c) Cinemo–a french spoken language resource for complex emotions: facts and baselines. In: LREC, Valetta
Schuller B, Zhang Z, Weninger F, Rigoll G (2011c) Using multiple databases for training emotion recognition: to unite or to vote ? In: Interspeech, Florence
Sehili M, Yang F, Leynaert V, Devillers L (2014) A corpus of social interaction between nao and elderly people. In: 5th international workshop on emotion, social signals, sentiment & linked open data (ES3LOD2014), LREC
Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics for human-robot interaction. In: HRI’06, Salt Lake City
Sun R, Moore EI (2013) Using rover for multiple databases training at the decision level for binary emotional recognition. In: ICASSP
Tahon M, Delaborde A, Devillers L (2011) Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices. In: Interspeech, Firenze
Walker M, Litman D, Kamm C, Abella A (1997) Paradise: a framework for evaluating spoken dialogue agents. In: EACL ’97, Madrid
Yagoda RE, Gillian DJ (2012) You want me to trust a robot? the development of a huma-robot interaction trust scale. Int J Soc Robot 4:235–248
Article Google Scholar
Zhang Z, Weninger F, Wöllmer M, Schuller B (2011) Unsupervised learning in cross-corpus acoustic emotion recognition. In: ASRU, Honolulu

Download references

Acknowledgments

This work was partially funded by the French projects FUI ROMEO and BPI ROMEO2. The authors thank coders and co-workers who participated in elaborating protocols and annotating emotional states.

Author information

Authors and Affiliations

LIMSI-CNRS, Orsay, France
Laurence Devillers, Marie Tahon, Mohamed A. Sehili & Agnes Delaborde
Université Paris-Sorbonne IV, Paris, France
Laurence Devillers

Authors

Laurence Devillers
View author publications
You can also search for this author in PubMed Google Scholar
Marie Tahon
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed A. Sehili
View author publications
You can also search for this author in PubMed Google Scholar
Agnes Delaborde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurence Devillers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Devillers, L., Tahon, M., Sehili, M.A. et al. Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions. Int J of Soc Robotics 7, 451–463 (2015). https://doi.org/10.1007/s12369-015-0297-8

Download citation

Accepted: 24 March 2015
Published: 10 April 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s12369-015-0297-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Abstract

Access this article

Similar content being viewed by others

Improving Speech-Based Human Robot Interaction with Emotion Recognition

Multimodal Database of Emotional Speech, Video and Gestures

Emotion Recognition from Speech

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Abstract

Access this article

Similar content being viewed by others

Improving Speech-Based Human Robot Interaction with Emotion Recognition

Multimodal Database of Emotional Speech, Video and Gestures

Emotion Recognition from Speech

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation