International Journal of Social Robotics

, Volume 4, Issue 1, pp 29–51 | Cite as

Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction

  • José Augusto PradoEmail author
  • Carlos Simplício
  • Nicolás F. Lori
  • Jorge Dias


We propose an approach to analyze and synthesize a set of human facial and vocal expressions, and then use the classified expressions to decide the robot’s response in a human-robot-interaction. During a human-to-human conversation, a person senses the interlocutor’s face and voice, perceives her/his emotional expressions, and processes this information in order to decide which response to give. Moreover, observed emotions are taken into account and the response may be aggressive, funny (henceforth meaning humorous) or just neutral according to not only the observed emotions, but also the personality of the person. The purpose of our proposed structure is to endow robots with the capability to model human emotions, and thus several subproblems need to be solved: feature extraction, classification, decision and synthesis. In the proposed approach we integrate two classifiers for emotion recognition from audio and video, and then use a new method for fusion with the social behavior profile. To keep the person engaged in the interaction, after each iterance of analysis, the robot synthesizes human voice with both lips synchronization and facial expressions. The social behavior profile conducts the personality of the robot. The structure and work flow of the synthesis and decision are addressed, and the Bayesian networks are discussed. We also studied how to analyze and synthesize the emotion from the facial expression and vocal expression. A new probabilistic structure that enables a higher level of interaction between a human and a robot is proposed.


Visual perception Auditory perception Emotion recognition Multimodal interaction Social behavior profile Bayesian networks 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gratch J, Marsella S, Petta P (2008) Modeling the cognitive antecedents and consequents of emotion. Cogn Syst 10(1):1–5 CrossRefGoogle Scholar
  2. 2.
    Kidd CD, Breazeal C (2007) A robotic weight loss coach. In: Proceedings of the twenty-second conference on artificial intelligence, Menlo Park, CA. AAAI Press, Menlo Park Google Scholar
  3. 3.
    Schroder M (2010) The semaine api: Towards a standards-based framework for building emotion-oriented systems. Adv Hum Comput Interact 2010:319406. doi: 10.1155/2010/319406/2010/319406. 21 pp. Google Scholar
  4. 4.
    Lee CM, Narayanan SS, Pieraccini R (2002) Classifying emotions in human-machine spoken dialogs. In: ICME Google Scholar
  5. 5.
    Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: ICASSP IEEE Google Scholar
  6. 6.
    Cowie R, Douglas-Cowie E, Karpouszis K, Caridakis G, Wallace M, Kollias S (2007) Recognition of emotional states in natural human-computer interaction. School of Psychology, Queen’s University Google Scholar
  7. 7.
    Darwin CR (1872) The expression of the emotions in man and animals, 1st edn. Murray, London CrossRefGoogle Scholar
  8. 8.
    Ekman P, Friesen WV, Hager JC (2002) Facial action coding system—the manual. A human face Google Scholar
  9. 9.
    Ekman P, Friesen W (2003) Unmasking the face: A guide to recognizing emotions from facial clues. Malor Books, Cambridge Google Scholar
  10. 10.
    Ekman P, Rosenberg E (2004) What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS), 2nd edn. Oxford University Press, London Google Scholar
  11. 11.
    Damasio A (2003) Looking for Spinoza. Harcourt Brace & Co., San Diego. ISBN:978-0-15-100557-4 Google Scholar
  12. 12.
    Damasio A (2000) The Feeling of what happens. Harcourt Brace & Co., San Diego. ISBN:978-0-15-601075-7 Google Scholar
  13. 13.
    Spinoza B (1677) Ethics Google Scholar
  14. 14.
    Chaitin GJ (2010) Meta math!: the quest for omega. Pantheon, New York. The University of Michigan Google Scholar
  15. 15.
    Lori N, Blin A (2010) Application of quantum Darwinism to cosmic inflation: an example of the limits imposed in Aristotelian logic by information-based approach to Godel’s incompleteness. Found Sci 15:199–211 CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Lori NF, Jesus P (2010) Matter and selfhood in Kant’s physics: a contemporary reappraisal. In: Balsemão Pires E, Nonnenmacher B, Büttner-von Stülpnagel S (eds) Relations of the self. Imprensa da Universidade de Coimbra, Coimbra, pp 207–226 Google Scholar
  17. 17.
    Levine PA (1997) Waking the tiger—healing trauma. North Atlantic Books, Berkeley Google Scholar
  18. 18.
    Damasio A (2010) Self comes to mind: constructing the conscious brain. Pantheon, New York Google Scholar
  19. 19.
    Evers K (2009) The empathetic xenophobe: a neurophilosophical view on the self. In: Centre for research ethics and bioethics, (CRB), Uppsala University. The text is adapted from Chap. 3 in Evers (2009): Neuroethique. Quand la matiere s eveille, Editions Odile Jacob, Paris, and was originally presented in an earlier version at College de France, Paris, 2006 Google Scholar
  20. 20.
    George S, Leroux P (2002) An approach to automatic analysis of learners social behavior during computer-mediated synchronous conversations. In: Cerri S, Gouarderes G, Paraguacu F (eds) Intelligent tutoring systems. Lecture notes in computer science, vol 2363. Springer, Berlin, pp 630–640 [Online]. Available: doi: 10.1007/3-540-47987-2_64 CrossRefGoogle Scholar
  21. 21.
    Kau AS, Tierney E, Bukelis I, Stump MH, Kates WR, Trescher WH, Kaufmann WE (2004) Social behaviour profile in young males with fragile x synfrome: characteristics and specificity. Am J Med Genet 126:9–17 CrossRefGoogle Scholar
  22. 22.
    Dahlbäck N, Jönsson A, Ahrenberg L (1993) Wizard of oz studies: Why and how. In: Proceedings of the international workshop on intelligent user interfaces, Orlando, FL. ACM, New York, pp 193–200 Google Scholar
  23. 23.
    Klemmer S, Sinha A, Chen J, Landay J, Aboobaker N, Wang A (2000) Suede: a wizard of oz prototyping tool for speech user interfaces. In: CHI letters: Proceedings of the ACM symposium on user interface software and technology, vol 2, pp 1–10 Google Scholar
  24. 24.
    Ernst M, Bülthoff H (2004) Merging the senses into a robust percept. Trends Cogn Sci 8(4):162–169 CrossRefGoogle Scholar
  25. 25.
    Sondhi M (1968) New methods of pitch extraction. IEEE Trans Audio Electroacoust 16:262–266 CrossRefGoogle Scholar
  26. 26.
    Boersma P, Weenink D, Eletronic University of Amsterdam [Online]. Available:
  27. 27.
    Invertions S (2010) Eletronic [Online]. Available:
  28. 28.
    Intel (2006) Intel open source computer vision library,
  29. 29.
    Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human-computer interaction. Proc IEEE 91(9):1370–1390 CrossRefGoogle Scholar
  30. 30.
    Paknikar G (2008) Facial image based expression classification system using committee neural networks. PhD dissertation, The Graduate Faculty of The University of Akron Google Scholar
  31. 31.
    Wuhan (2004) Facial expression recognition based on local binary patterns and coarse-to-fine classification. In: Fourth international conference on computer and information technology (CIT’04), vol 16 Google Scholar
  32. 32.
    Pantic M (2009) Facial expression recognition. In: Encyclopedia of biometrics, pp 400–406 Google Scholar
  33. 33.
    Nicolaou MA, Gunes H, Pantic M (2010) Audio-visual classification and fusion of spontaneous affective data in likelihood space. In: ICPR, pp 3695–3699 Google Scholar
  34. 34.
    Yang MH, Kriegman DJ, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24:34–58 CrossRefGoogle Scholar
  35. 35.
    Viola P, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE CVPR Google Scholar
  36. 36.
    Cohen I, Sebe N, Garg A, Lew M, Huang T (2002) Facial expression recognition from video sequences. In: Proc ICME, pp 121–124 Google Scholar
  37. 37.
    Sebe N, Lew M, Cohen I, Garg A, Huang T (2002) Emotion recognition using a Cauchy naive Bayes classifier. In: Proc ICPR, vol 1, pp 17–20 Google Scholar
  38. 38.
    Stock O, Strapparava C (2003) Getting serious about the development of computational humor. In: Proceedings of the 8th international joint conference on artificial intelligence (IJCAI), pp 59–64 Google Scholar
  39. 39.
    Stock O, Strapparava C (2005) The act of creating humorous acronyms. J Appl Artif Intell 19:137–151 CrossRefGoogle Scholar
  40. 40.
    Ritchie G (1998) Prospects for computational humor. In: Proceedings of 7th IEEE international workshop on robot and human communication, pp 283–291 Google Scholar
  41. 41.
    Binsted K Pain H, Ritchie G (1997) Children’s evaluation of computer-generated punning riddles. Department of Artificial Intelligence, University of Edinburgh Google Scholar
  42. 42.
    Prado J, Lobo J, Dias J (2010) Sophie: social robotic platform for human interactive experimentation. In: 4th international conference on cognitive systems, COGSYS 2010, ETH Zurich, Switzerland Google Scholar
  43. 43.
    Prado J, Santos L, Dias J (2009) Horopter based dynamic background segmentation applied to an interactive mobile robot. In: 14th international conference on advanced robotics, ICAR09, Munich, Germany Google Scholar
  44. 44.
    Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181 [Online]. Available: CrossRefGoogle Scholar
  45. 45.
    Lyons M, Budynck J, Akamatsu S (1999) Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 21:1357–1362 CrossRefGoogle Scholar
  46. 46.
    Kanade T, Cohn V, Tian Y, (2000) Cohn-Kanade au-coded facial expression database [Online]. Available:
  47. 47.
    Kamachi M, Lyons M, Gyoba J (1998) The Japanese female facial expression (jaffe) database [Online]. Available:

Copyright information

© Springer Science & Business Media BV 2011

Authors and Affiliations

  • José Augusto Prado
    • 1
    Email author
  • Carlos Simplício
    • 1
    • 2
  • Nicolás F. Lori
    • 3
  • Jorge Dias
    • 1
  1. 1.Institute of Systems and RoboticsUniversity of CoimbraCoimbraPortugal
  2. 2.Institute Polytechnic of LeiriaLeiriaPortugal
  3. 3.Institute of Biomedical Research in Light and Image (IBILI), Faculty of MedicineUniversity of CoimbraCoimbraPortugal

Personalised recommendations