Head movements, facial expressions and feedback in conversations: empirical evidence from Danish multimodal data

Original Paper

Abstract

This article deals with multimodal feedback in two Danish multimodal corpora, i.e., a collection of map-task dialogues and a corpus of free conversations in first encounters between pairs of subjects. Machine learning techniques are applied to both sets of data to investigate various relations between the non-verbal behaviour—more specifically head movements and facial expressions—and speech with regard to the expression of feedback. In the map-task data, we study the extent to which the dialogue act type of linguistic feedback expressions can be classified automatically based on the non-verbal features. In the conversational data, on the other hand, non-verbal and speech features are used together to distinguish feedback from other multimodal behaviours. The results of the two sets of experiments indicate in general that head movements, and to a lesser extent facial expressions, are important indicators of feedback, and that gestures and speech disambiguate each other in the machine learning process.

Keywords

Gestures Head movements Facial expressions Feedback Backchanneling 

References

  1. 1.
    Allwood J, Cerrato L, Jokinen K, Navarretta C, Paggio P (2007) The MUMIN coding scheme for the annotation of feedback, turn management and sequencing: multimodal corpora for modelling human multimodal behaviour. Int J Lang Resour Eval 41(3-4):273–287 (Special Issue)CrossRefGoogle Scholar
  2. 2.
    Allwood J, Nivre J, Ahlsén E (1992) On the semantics and pragmatics of linguistic feedback. J Semant 9:1–26CrossRefGoogle Scholar
  3. 3.
    Boersma P, Weenink D (2009) Praat: doing phonetics by computer (version 5.1.05) [computer program]. Retrieved 1 May 1 2009, from http://www.praat.org/
  4. 4.
    Busso C, Narayanan SS (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio Speech Lang Process 15:2331–2347CrossRefGoogle Scholar
  5. 5.
    Cerrato L (2007) Investigating communicative feedback phenomena across languages and modalities: PhD thesis. KTH, Speech and Music Communication, StockholmGoogle Scholar
  6. 6.
    Duncan S (1972) Some signals and rules for taking speaking turns in conversations. J Pers Soc Psychol 23:283–292CrossRefGoogle Scholar
  7. 7.
    Fujie S, Ejiri Y, Nakajima N, Matsusaka Y, Kobayashi T (2004) A conversation robot using head gesture recognition as para-linguistic information. In: Proceedings of the 13th IEEE international workshop on robot and human interactive, communication, September 2004, pp 159–164Google Scholar
  8. 8.
    Gregersen F, Pedersen IL (eds) (1991) The Copenhagen study in urban sociolinguistics. Reitzel, CopenhagenGoogle Scholar
  9. 9.
    Grønnum N (2006) DanPASS: a Danish phonetically annotated spontaneous speech corpus. In: Calzolari N, Choukri K, Gangemi A, Maegaard B, Mariani J, Odijk J, Tapias D (eds) Proceedings of the 5th LREC, Genoa, May, pp 1578–1583Google Scholar
  10. 10.
    Hadar U, Steiner TJ, Rose FC (1985) Head movement during listening turns in conversation. J Nonverbal Behav 9(4):214–228CrossRefGoogle Scholar
  11. 11.
    Hager JC, Ekman P et al (1983) The inner and outer meanings of facial expressions, chapter 10. In: Cacioppo JT, Petty RE (eds) Social psychophysiology: a sourcebook. The Guilford Press, New YorkGoogle Scholar
  12. 12.
    Jokinen K, Navarretta C, Paggio P (2008) Distinguishing the communicative functions of gestures. In: Proceedings of the 5th MLMI, LNCS 5237, Utrecht, September 2008. Springer, pp 38–49Google Scholar
  13. 13.
    Kendon A (1972) Some relationships between body motion and speech. In: Seigman A, Pope B (eds) Studies in dyadic communication. Pergamon Press, Elmsford, p 216Google Scholar
  14. 14.
    Kipp M (2004) Gesture Generation by imitation—from human behavior to computer character animation: PhD thesis. Saarland University, Saarbruecken, Germany, Boca Raton, Florida. http://www.dissertation.com. Accessed 7 Aug 2012
  15. 15.
    Loehr DP (2004) Gesture and intonation. PhD thesis, Georgetown University, WashingtonGoogle Scholar
  16. 16.
    Lu J, Allwood J (2011) Unimodal and multimodal feedback in Chinese and Swedish monocultural and intercultural interactions (a pilot study). In: Paggio p, Ahlsén E, Allwood J, Jokinen K, Navarretta C (eds) Proceedings of the 3rd Nordic symposium on multimodal communication, NEALT, May 27–28 2011, pp 40–47. Available at: http://dspace.utlib.ee/dspace/handle/10062/22532. Accessed 7 Aug 2012
  17. 17.
    Lu J, Allwood J, Ahlsén E (2011) A study on cultural variations of smile based on empirical recordings of Chinese and Swedish first encounters. Talk given at the workshop on Multimodal Corpora at ICMI-MLMI, Alicante, SpainGoogle Scholar
  18. 18.
    Maynard S (1987) Interactional functions of a nonverbal sign: head movement in Japanese dyadic casual conversation. J Pragmat 11:589–606CrossRefGoogle Scholar
  19. 19.
    McClave E (2000) Linguistic functions of head movements in the context of speech. J Pragmat 32:855–878CrossRefGoogle Scholar
  20. 20.
    Morency L-P, de Kok I, Gratch J (2009) A probabilistic multimodal approach for predicting listener backchannels. Auton Agents Multi-Agent Syst 20:70–84CrossRefGoogle Scholar
  21. 21.
    Morency L-P, Sidner C, Lee C, Darrell T (2005) Contextual recognition of head gestures. In: Proceedings of the international conference on multi-modal interfacesGoogle Scholar
  22. 22.
    Morency L-P, Sidner C, Lee C, Darrell T (2007) Head gestures for perceptual interfaces: the role of context in improving recognition. Artif Intell 171(8–9):568–585CrossRefGoogle Scholar
  23. 23.
    Murray G, Renals S (2008) Detecting action meetings in meetings. In: Proceedings of the 5th MLMI, LNCS 5237, Utrecht, The Netherlands, September 2008. Springer, pp 208–213Google Scholar
  24. 24.
    Navarretta C, Paggio P (2010) Classification of feedback expressions in multimodal data. In: Proceedings of the 48th annual meeting of the association for computational linguistics (ACL 2010), Uppsala, Sweden, July 11–16, 2010, pages 318–324Google Scholar
  25. 25.
    Akker H, Schulz C (2008) Exploring features and classifiers for dialogue act segmentation. In: Proceedings of the 5th MLMI, pp 196–207Google Scholar
  26. 26.
    Paggio P (2006) Annotating information structure in a corpus of spoken Danish. In: Proceedings of the 5th international conference on language resources and evaluation LREC2006, Genova, Italy, 2006, pp 1606–1609Google Scholar
  27. 27.
    Paggio P (2006) Information structure and pauses in a corpus of spoken Danish. In: Conference companion of the 11th conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006, pp 191–194Google Scholar
  28. 28.
    Paggio P, Allwood J, Ahlsén E, Jokinen K, Navarretta C (2010) The NOMCO multimodal Nordic resource: goals and characteristics. In: Proceedings of the seventh conference on international language resources and evaluation (LREC 2010), Valletta, Malta, 2010. European Language, Resources Association (ELRA)Google Scholar
  29. 29.
    Paggio P, Diderichsen P (2010) Information structure and communicative functions in spoken and multimodal data. In: Henriksen PJ (ed) Linguistic theory and raw sound: volume 49 of Copenhagen Studies in Language. Samfundslitteratur, Denmark, pp 149–168Google Scholar
  30. 30.
    Paggio P, Navarretta C (2010) Feedback in head gesture and speech. In: Kipp M, Martin JC, Paggio P, Heylen D (eds) Proceedings of the workshop on Multimodal Corpora held in conjunction with LREC 2010, Malta, May 17 2010, pp 1–4Google Scholar
  31. 31.
    Paggio P, Navarretta C (2011) Feedback and gestural behaviour in a conversational corpus of Danish. In: Paggio P, Ahlsén E, Allwood J, Jokinen K, Navarretta C (eds) Proceedings of the 3rd Nordic symposium on multimodal communication, NEALT, May 27–28, 2011, pp 33–39. Available at: http://dspace.utlib.ee/dspace/handle/10062/22532
  32. 32.
    Paggio P, Navarretta C (2011) Learning to classify the feedback function of head movements in a Danish corpus of first encounters. In: Talk given at the workshop on Multimodal Corpora at ICMI, November 2011, Alicante. SpainGoogle Scholar
  33. 33.
    Reidsma D, Heylen D, Akker R (2009) On the contextual analysis of agreement scores. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Multimodal corpora from models of natural interaction to systems and applications, number 5509 in LNAI. Springer, pp 122–137Google Scholar
  34. 34.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San FranciscoMATHGoogle Scholar
  35. 35.
    Yngve V (1970) On getting a word in edgewise. In: Papers from the sixth regional meeting of the Chicago Linguistic Society, pp 567–578Google Scholar
  36. 36.
    Zhang H, Jiang L, Su J (2005) Hidden naive bayes. In: Proceedings of the twentieth national conference on artificial intelligence, pp 919–924Google Scholar

Copyright information

© OpenInterface Association 2012

Authors and Affiliations

  1. 1.Centre for Language TechnologyUniversity of CopenhagenCopenhagenDenmark
  2. 2.Institute of LinguisticsUniversity of MaltaMsidaMalta

Personalised recommendations