Skip to main content
Log in

The Automatic Identification of the Producers of Co-occurring Communicative Behaviours

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Multimodal communicative behaviours depend on numerous factors such as the communicative situation, the task, the culture and respective relationship of the people involved, their role, age, and background. This paper addresses the identification of the producers of co-occurring communicative non-verbal behaviours in a manually annotated multimodal corpus of spontaneous conversations. The work builds upon a preceding study in which a support vector machine was trained to identify the producers of communicative body behaviours using the annotations of individual behaviour types. In the present work, we investigate to which extent classification results can be improved adding to the training data the shape description of co-occurring body behaviours and temporal information. The inclusion of co-occurring behaviours reflects the fact that people often use more body behaviours at the same time when they communicate. The results of the classification experiments show that the identification of the producers of communicative behaviours improves significantly if co-occurring behaviours are added to the training data. Classification performance further improves when it also uses temporal information. Even though the results vary from body type to body type, they all show that the individual variation of communicative behaviours is large even in a very homogeneous group of people and that this variation is better modelled using information on co-occurring behaviours than individual behaviours. Being able to identify and then react correctly to individual behaviours of people is extremely important in the field of social robotics which involves the use of robots in private homes where they must interact in a natural way with different types of persons having varying needs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://dk-clarin.dk.

  2. http://talkbank.org.

  3. These are also connected to the most common functions and the semiotic types [60] of each type of behaviour.

References

  1. Yngve V. On getting a word in edgewise. In: Papers from the sixth regional meeting of the Chicago Linguistic Society; 1970, p. 567–78.

  2. McClave E. Linguistic functions of head movements in the context of speech. J Pragmat. 2000;32:855–78.

    Article  Google Scholar 

  3. Cerrato L. Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis. Stockholm, KTH: Speech and Music Communication; 2007.

  4. Paggio P, Navarretta C. Learning to classify the feedback function of head movements in a Danish corpus of first encounters. In: Proceedings of ICMI2011 workshop multimodal corpora for machine learning: taking stock and road mapping the future. Alicante, Spain; 2011.

  5. Rehm M, Andre E, Bee N, Endrass B, Wissner M, Nakano Y, Lipi AA, Nishida T, Huang HH. Creating standardized video recordings of multimodal interactions across cultures. In: Kipp M, Martin JC, Paggio P, Heylen D, editors. Multimodal corpora. From models of natural interaction to systems and applications, no. 5509 in LNAI. Springer; 2009. p. 138–59.

  6. Navarretta C, Ahlsn E, Allwood J, Jokinen K, Paggio P. Feedback in nordic first-encounters: a comparative study. In: Proceedings of LREC 2012. Istanbul Turkey; 2012, p. 2494–99.

  7. de Kok IA, Heylen DKJ. Differences in listener responses between procedural and narrative tasks. In: Proceedings of the 2nd international workshop on social signal processing, SSPW ’10. Florence, Italy, New York: ACM; 2010. p. 5–10.

  8. Navarretta C, Paggio P. Verbal and non-verbal feedback in different types of interactions. In: Proceedings of LREC 2012. Istanbul Turkey; 2012. p. 2338–42.

  9. Navarretta C. Annotating behaviours in informal interactions. In: Esposito A, Vinciarelli A, Vicsi K, Pelachaud C, Nijholt A, editors. Analysis of verbal and nonverbal communication and enactment: the processing issues, LNCS 6800. Berlin: Springer; 2011. p. 317–24.

  10. de Kok IA, Heylen DKJ. The MultiLis corpus—dealing with individual differences in nonverbal listening behavior. In: Esposito A, Esposito AM, Martone R, Müller VC, Scarpetta G, editors. Third COST 2102 international training school, Caserta, Italy, lecture notes in computer science, vol. 6456. Berlin: Springer; 2011. p. 362–75.

  11. Turk M. Computer vision in the interface. Commun ACM. 2004;47(1):60–7.

    Article  Google Scholar 

  12. Kumar M, Garfinkel T, Boneh D, Winograd T. Reducing shoulder-surfing by using gaze-based password entry. In: Proceedings of the 3rd symposium on usable privacy and security, SOUPS ’07. New York, NY: ACM; 2007, p. 13–19.

  13. Idris F, Panchanathan S. Review of image and video indexing techniques. J Vis Commun Image Represent. 1997;8(2):146–66.

    Article  Google Scholar 

  14. Hu W, Xie N, Li L, Zeng X, Maybank S. A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev. 2011;41(6):797–819.

    Article  Google Scholar 

  15. Furui S. Speaker independent isolated word recognition using dynamic features of speech spectrum. In: IEEE transactions on acoustic, speech, signal processing, ASSP-34, vol. 1; 1985. p. 52–59.

  16. Sanchez MH, Ferrer L, Shriberg E, Stolcke A. Constrained cepstral speaker recognition using matched UBM and JFA training. In: Proceedings of interspeech-2011. Florence, Italy; 2011. p. 141–44.

  17. Harb H, Chen L. Voice-based gender identification in multimedia applications. J Intell Inf Syst. 2005;24(2):179–98.

    Article  Google Scholar 

  18. Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, Muller C, Huber R, Andrassy B, Bauer JG, Little B. Comparison of four approaches to age and gender recognition for telephone applications. In: Proceedings of 2007 IEEE international conference acoustics, speech and signal processing, vol. 4. Honolulu; 2007. p. 1089–92.

  19. Brunelli R, Falavigna D, Poggio T, Stringa L. Automatic person recognition by acoustic and geometric features. Mach Vis Appl. 1995;8(5):317–25.

    Article  Google Scholar 

  20. Beigi S. Fundamentals of speaker recognition. Berlin: Springer; 2011.

    Book  Google Scholar 

  21. Erzin E, Yemez Y, Tekalp A. Multimodal speaker identification using an adaptive classifier cascade based on modality reliability. IEEE Trans Multimed. 2005;7(5):840–55.

    Article  Google Scholar 

  22. Wu Z, Cai L, Meng H. Multi-level fusion of audio and visual features for speaker identification. In: International conference on advances in biometrics; 2006. p. 493–99.

  23. Wee-Chung A, Wang L, Wang S, editors. Speech recognition: lip segmentation and mapping. IGI Global; 2009.

  24. Stiefelhagen R. Tracking focus of attention in meetings. In: Multimodal interfaces 2002. IEEE international conference on multimodal interfaces. Pittsburgh, PA, USA; 2002. p. 273–80.

  25. Busso C, Deng Z, Yildirim S, Bulut M, Lee C, Kazemzaeh A, Lee S, Neumann U, Narayanan S. Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of ACM 6th international conference on multimodal interfaces—ICMI 2004. State College, PA; 2004. p. 205–11.

  26. Bourbakis N, Esposito A, Kavraki D. Extracting and associating meta-features for understanding people’s emotional behaviour: face and speech. Cogn Comput. 2011;3:436–48.

    Article  Google Scholar 

  27. Esposito A, Riviello MT. The cross-modal and cross-cultural processing of affective information. In: Proceedings of the 2011 conference on Neural Nets WIRN10: proceedings of the 20th Italian workshop on neural nets. Amsterdam, The Netherlands: IOS Press; 2011. p. 301–10.

  28. Morency LP, de Kok I, Gratch J. A probabilistic multimodal approach for predicting listener backchannels. Auton Agents Multi-Agent Syst. 2009;20:70–84.

    Article  Google Scholar 

  29. Paggio P, Navarretta C. Head movements, facial expressions and feedback in conversations—empirical evidence from danish multimodal data. J Multimodal User Interfaces Spec Issue Multimodal Corpora. 2013;7(1–2):29–37.

    Article  Google Scholar 

  30. Neff M, Kipp M, Albrecht I, Gesture Seidel H-P. Modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 2008;27(1):1–24.

    Article  Google Scholar 

  31. Bergmann K, Kopp S. GNetIc-using Bayesian decision networks for iconic gesture generation. In: Ruttkay Z et al., editors. Proceedings of the 9th conference on intelligent virtual agents (LNAI 5773); 2009. p. 76–89.

  32. Bergmann K, Kopp S. Modelling the production of co-verbal iconic gestures by learning bayesian decision networks. Appl Artif Intell. 2010;24(6):530–51.

    Article  Google Scholar 

  33. Mancini M, Pelachaud C. Distinctiveness in multimodal behaviors. In: Seventh international joint conference on autonomous agents and multi-agent systems. AAMAS’08. Estoril, Portugal; 2008.

  34. de Sevinand C, Pelachaudand M, McRorieand I. Sneddon EBE. Building credible agents: behaviour influenced by personality and emotional traits. In: KEER international conference on KANSEI engineering and emotion research 2010. Paris; 2010. p. 1716–26.

  35. Mana N, Lepri B, Chippendale P, Cappelletti A, Pianesi F, Svaizer P, Zancanaro M. Multimodal corpus of multi-party meetings for automatic social behavior analysis and personality traits detection. In: Proceedings of the 2007 workshop on tagging, mining and retrieval of human related activity information, TMR ’07. New York, NY, USA: ACM; 2007. p. 9–14.

  36. Hostetter AB, Alibali MW. Raise your hand if youre spatial. Gesture. 2007;7(1):73–95.

    Article  Google Scholar 

  37. Navarretta C. Individuality in Communicative Bodily Behaviours. In: Esposito A, Esposito AM, Vinciarelli A, Hoffmann R, Müller VC, editors. Behavioural cognitive systems, lecture notes in computer science, vol. 7403. Berlin: Springer; 2012. p. 417–23.

  38. Gallaher P. Individual differences in nonverbal behavior: dimensions of style. J Personal Soc Psychol. 1992;63(1):133–45.

    Article  Google Scholar 

  39. MacWhinney B, Wagner J. Transcribing, searching and data sharing: the CLAN software and the TalkBankdata repository. Gespraechsforschung. 2010;11:154–73.

    Google Scholar 

  40. Boersma P, Weenink D. Praat: doing phonetics by computer (2013). Retrieved from May 1 2013, http://www.praat.org/.

  41. Kipp M. Gesture generation by imitation—from human behavior to computer character animation. Ph.D. thesis. Boca Raton, FL: Saarland University. dissertation.com; 2004.

  42. Allwood J, Cerrato L, Jokinen K, Navarretta C, Paggio P. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing. multimodal corpora for modelling human multimodal behaviour. Spec Issue Int J Lang Resour Eval. 2007;41(3–4):273–87.

    Article  Google Scholar 

  43. Navarretta C. Anaphora and gestures in multimodal communication. In: Hendrickx I, Branco A, Devi L, Mitkov R, editors. Proceedings of the 8th discourse anaphora and anaphor resolution colloquium (DAARC 2011). Faro, Portugal: Edicoes Colibri; 2011. p. 171–81.

  44. Duncan S. McNeill lab coding methods. Technical report. McNeill Lab; 2004.

  45. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.

    Article  Google Scholar 

  46. Jokinen K, Navarretta C, Paggio P. Distinguishing the communicative functions of gestures. In: Proceedings of the 5th MLMI, LNCS 5237. Utrecht: Springer; 2008. p. 38–49.

  47. Navarretta C, Allwood J, Ahlsén E, Jokinen K, Paggio P. Creating comparable multimodal corpora for nordic languages. In: Proceedings of the 18th nordic conference of computational linguistics. Riga, Latvia: NEALT; 2011. p. 153–60.

  48. Paggio P, Navarretta C. Classifying the feedback function of head movements and face expressions. In: Proceedings of LREC 2012 workshop multimodal corpora—how should multimodal corpora deal with the situation? Istanbul, Turkey; 2012. p. 2494–99.

  49. Nobe S. Where do most spontaneous representational gestures actually occur with respect to speech? In: McNeill D, editor. Language and gesture. Cambridge: CUP; 2000. p. 186–98.

  50. Chui K. Temporal patterning of speech and iconic gestures in conversational discourse. J Pragmat. 2005;37:871–87.

    Article  Google Scholar 

  51. Loehr D. Gesture and intonation. PhD Thesis, Georgetown University; 2004.

  52. Loehr D. Aspects of rhythm in gesture and speech. Geture. 2007;7(2):179–204.

    Google Scholar 

  53. Ferré G. Timing relationships between speech and co-verbal gestures in spontaneous French. In: Proceedings of LREC workshop on multimodal corpora. ELRA, Valletta Malta; 2010.

  54. Leonard T, Cummins F. The temporal relation between beat gestures and speech. Lang Cogn Process. 2011;26(10):1457–71.

    Article  Google Scholar 

  55. Loehr D. Temporal, structural, and pragmatic synchrony between intonation and gesture. Lab Phonol. 2012;3(1):71–89.

    Article  Google Scholar 

  56. Trouvain J. Tempo variation in speech production. Implications for speech synthesis. Doctoral Thesis. Saarbrüchen: Saarland University; 2004.

  57. Jacewicz E, Fox RA, Wei L. Between-speaker and within-speaker variation in speech tempo of american english. J Acoust Soc Am. 2010;128(2):832–50.

    Article  Google Scholar 

  58. Witten IH, Frank E. Data mining: practical machine learning tools and techniques. 2nd ed. San Francisco: Morgan Kaufmann; 2005.

    Google Scholar 

  59. Platt J. Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf B, Burges C, Smola A, editors. Advances in kernel methods–support vector Learning. Cambridge, MA: MIT Press; 1998. p. 41–65.

    Google Scholar 

  60. Peirce, CS, Collected papers of Charles Sanders Peirce. In: Hartshorne C, Weiss P, Burks A, editors, vol. 8. Cambridge, MA: Harvard University Press; 1931–1958. p. 1931–58.

  61. Maynard S. Interactional functions of a nonverbal sign: head movement in Japanese dyadic casual conversation. J Pragmat. 1987;11:589–606.

    Article  Google Scholar 

  62. Kita S, Özyurek A. How does spoken language shape iconic gestures? In: Duncan S, Cassel J, Levy E, editors. Gesture and the dynamic dimension of language. Amsterdam: Benjamins; 2007. p. 67–74.

    Chapter  Google Scholar 

  63. Togneri R, Pullella D. An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag. 2011;11(2nd quater):23–61.

  64. Navarretta C, Paggio P. Classifying multimodal turn management in Danish dyadic first encounters. In: Proceedings of the 19th nordic conference of computational linguistics NoDaLiDa; 2013. p. 133–46.

  65. Navarretta C. Transfer learning in multimodal corpora. In: Proceedings of the 4th IEEE international conference on cognitive infocommunications (CogInfoCom2013), Budapest, Hungary; 2013. p. 195–00.

Download references

Acknowledgments

Thanks to the researchers at the University of Southern Denmark for the corpus collection, to the Danish Research Councils, and the participants of the dk-clarin project. Furthermore, thanks go to Jens Allwood, Elisabeth Ahlsén, Kristiina Jokinen and last but not least Patrizia Paggio for the many inspiring discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Costanza Navarretta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Navarretta, C. The Automatic Identification of the Producers of Co-occurring Communicative Behaviours. Cogn Comput 6, 689–698 (2014). https://doi.org/10.1007/s12559-014-9269-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-014-9269-9

Keywords

Navigation