Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

  • Lei Zhang
  • Hung-Hsuan HuangEmail author
  • Kazuhiro Kuwabara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10914)


The elderly who live alone are increasing rapidly in these years. For their mental health, maintaining their social life with others is reported useful. Our project aims to develop a listener agent who can engage active listening dialog with the elderly users. Active listening is a communication technique that the listener listens to the speaker carefully and attentively. The listener also ask questions for confirming or showing his/her concern about what the speaker said. For this task, it is essential for the agent to evaluate the user’s engagement level (or the attitude) in the conversation. In this paper, we explored an automatic estimation method based on empirical results. An active listening conversation experiment with human-human participants was conducted for corpus collection. The speakers’ engagement attitude in the corpus was subjectively evaluated by human evaluators. Support vector regression models dedicated to the periods when the speaker is speaking, the listener is speaking, and no one is speaking are built with non-verbal features extracted from facial expressions, head movements, prosody and speech turns. The resulted accuracy was not high but showed the potential of the proposed method.


Active listening Elderly support Multimodal interaction 


  1. 1.
    Ahlberg, J.: Candide-3 – an updated parameterised face. Technical report. LiTH-ISY-R-2326, Department of Electrical Engineering, Linkoping University, January 2001Google Scholar
  2. 2.
    Baltrusaitis, T., Ahuja, C., Morency, L.: Multimodal machine learning: a survey and taxonomy. CoRR abs/1705.09406 (2017).
  3. 3.
    Bickmore, T., Bukhari, L., Vardoulakis, L.P., Paasche-Orlow, M., Shanahan, C.: Hospital buddy: a persistent emotional support companion agent for hospital patients. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS (LNAI), vol. 7502, pp. 492–495. Springer, Heidelberg (2012). Scholar
  4. 4.
    Cowie, R., Schröder, M.: Piecing together the emotion jigsaw. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 305–317. Springer, Heidelberg (2005). Scholar
  5. 5.
    Ekman, P., Friesen, W.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1978)Google Scholar
  6. 6.
    Frank, E., Hall, M.A., Witten, I.H.: The weka workbench. In: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann (2016)Google Scholar
  7. 7.
    Hofstede, G., Hofstede, G.J., Minkov, M.: Cultures and Organizations: Software of the Mind, 3rd edn. McGraw-Hill, New York (2010)Google Scholar
  8. 8.
    Huang, H.H., Matsushita, H., Kawagoe, K., Sakai, Y., Nonaka, Y., Nakano, Y., Yasuda, K.: Toward a memory assistant companion for the individuals with mild memory impairment. In: 11th IEEE International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC 2012), Kyoto, pp. 295–299, August 2012Google Scholar
  9. 9.
    Huang, H.H., Shibusawa, S., Hayashi, Y., Kawagoe, K.: Toward a virtual companion for the elderly: an investigation on the interaction between the attitude and mood of the participants during active listening. In: 1st International Conference on Human-Agent Interaction (iHAI 2013), Sapporo, Japan, August 2013Google Scholar
  10. 10.
    Huang, L., Morency, L.-P., Gratch, J.: Virtual rapport 2.0. In: Vilhjálmsson, H.H., Kopp, S., Marsella, S., Thórisson, K.R. (eds.) IVA 2011. LNCS (LNAI), vol. 6895, pp. 68–79. Springer, Heidelberg (2011). Scholar
  11. 11.
    Lausberg, H., Sloetjes, H.: Coding gestural behavior with the NEUROGES-ELAN system. Behav. Res. Methods 41(3), 841–849 (2009)CrossRefGoogle Scholar
  12. 12.
    McKeown, G., Valstar, M.F., Cowie, R., Pantic, M.: The SEMAINE corpus of emotionally coloured character interactions. In: IEEE International Conference Multimedia and Expo, pp. 1079–1084 (2011)Google Scholar
  13. 13.
    Otake, M., Kato, M., Takagi, T., Asama, H.: Coimagination method: supporting interactive conversation for activation of episodic memory, division of attention, planning function and its evaluation via conversation interactivity measuring method. In: International Symposium on Early Detection and Rehabilitation Technology of Dementia, pp. 167–170 (2009)Google Scholar
  14. 14.
    Pammi, S., Schro, M.: Annotating meaning of listener vocalizations for speech synthesis. In: 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009), pp. 1–6 (2009)Google Scholar
  15. 15.
    Tickle-Degnen, L., Rosenthal, R.: The nature of rapport and its nonverbal correlates. Psychol. Inq. 1(4), 285–293 (1990)CrossRefGoogle Scholar
  16. 16.
    Ustun, B., Melssen, W., Buydens, L.: Facilitating the application of support vector regression by using a universal pearson vii function based kernel. Chemometr. Intell. Lab. Syst. 81(1), 29–40 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Lei Zhang
    • 1
  • Hung-Hsuan Huang
    • 1
    • 2
    Email author
  • Kazuhiro Kuwabara
    • 1
  1. 1.College of Information Science and EngineeringRitsumeikan UniversityKusatsuJapan
  2. 2.Center for Advanced Intelligence ProjectRIKENKyotoJapan

Personalised recommendations