Multimedia Tools and Applications

, Volume 77, Issue 1, pp 503–518 | Cite as

Modeling the synchrony between interacting people: application to role recognition

  • Sheng FangEmail author
  • Catherine Achard
  • Séverine Dubuisson


The study of social interactions has attracted increasing attentions. The role recognition is one of its possible applications and the core of this study. This article proposes some approaches to automatically recognize the role of the participants of a meeting by modeling the synchrony of temporal nonverbal audio features. In our approache the Influence Model (IM), a Hidden Markov Model (HMM)-like, is used to model this synchrony and to extract from input data a feature vector that contains both information about temporal transitions (intra-personal data) and interaction between participants (inter-personal data). This modeling of the meeting is used as input of a Random Forests (RFs) for the role recognition task. The experiments are performed on 138 meetings (approximately 45 hours of recordings) from Augmented Multiparty Interaction (AMI) Corpus. Accuracy scores show that this combination of generative (IM) and discriminative (RFs) approaches permits to outperform state-of-the-art role recognition rates.


Role recognition Influence model Interaction modelling Synchrony 



This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.


  1. 1.
    Asavathiratham C (2000) The influence model: A tractable representation for the dynamics of networked Markov chains. MIT. PhD thesisGoogle Scholar
  2. 2.
    Banerjee S, Cohen J, Quisel T, Chan A, Patodia Y, Al Bawab Z, Zhang R, Black A, Stern RM, Rudnicky AI et al (2004) Creating multi-modal, user-centric records of meetings with the carnegie mellon meeting recorder architecture. In: International Conference on Acoustic Speech and Signal ProcessingGoogle Scholar
  3. 3.
    Banerjee S, Rudnicky AI (2004) Using simple speech–based features to detect the state of a meeting and the roles of the meeting participants. In: International Conference on Spoken Language Processing, pp 1–4Google Scholar
  4. 4.
    Basu S, Choudhury T, Clarkson B, Pentland A et al (2001) Learning human interactions with the influence model. In: Conference on Neural Information Processing SystemsGoogle Scholar
  5. 5.
    Bernardo J, Bayarri M, Berger J, Dawid A, Heckerman D, Smith A, West M (2007) Generative or discriminative?getting the best of both worlds. Bayesian Statistics 8:3–24MathSciNetGoogle Scholar
  6. 6.
    Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Computer Vision and Pattern Recognition, pp 994–999Google Scholar
  7. 7.
    Cristani M, Pesarin A, Drioli C, Tavano A, Perina A, Murino V (2011) Generative modeling and classification of dialogs by a low-level turn-taking feature. Pattern Recogn 44(8):1785–1800CrossRefGoogle Scholar
  8. 8.
    Delaherche E, Chetouani M, Mahdhaoui A, Saint-Georges C, Viaux S, Cohen D (2012) Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans Affect Comput 3(3):349–365CrossRefGoogle Scholar
  9. 9.
    Dong W, Lepri B, Cappelletti A, Pentland AS, pianesi F, Zancanaro M (2007) Using the influence model to recognize functional roles in meetings. In: International Conference on Multimedia Interaction, pp 271–278Google Scholar
  10. 10.
    Dong W, Lepri B, Pianesi F, Pentland A (2013) Modeling functional roles dynamics in small group interactions. IEEE Transactions on Multimedia 15(1):83–95CrossRefGoogle Scholar
  11. 11.
    Garg NP, Favre S, Salamin H, Hakkani tür D, Vinciarelli A (2008) Role recognition for meeting participants: an approach based on lexical information and social network analysis. In: MM, pp 693–696Google Scholar
  12. 12.
    Holub A, Perona P (2005) A discriminative framework for modelling object classes. In: Computer Vision and Pattern Recognition, pp 664–671Google Scholar
  13. 13.
    Jayagopi DB, Ba S, Odobez J-M, Gatica-Perez D (2008) Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues. In: International Conference on Multimedia Interaction, pp 45–52Google Scholar
  14. 14.
    Laskowski K, ostendorf M, Schultz T (2008) Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Workshop of Special Interest Group on Discourse and Dialogue, pp 148–155Google Scholar
  15. 15.
    Lassere J, Bishop C (2007) Generative or discriminative? getting the best of both worlds. Bayesian Statistics 8:3–24MathSciNetzbMATHGoogle Scholar
  16. 16.
    Liu Y (2006) Initial study on automatic identification of speaker role in broadcast news speech. In: Conference of the North American Chapter of the Association for Computational Linguistics, Human Language Technology, pp 81–84Google Scholar
  17. 17.
    Mccowan I, Carletta J, Kraaij W, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V et al (2005) The AMI meeting corpus. In: Measuring Behavior, vol 88Google Scholar
  18. 18.
    McDowell LK, Gupta KM, Aha DW (2009) Cautious collective classification. J Mach Learn Res 10:2777–2836MathSciNetzbMATHGoogle Scholar
  19. 19.
    Ng A, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Conference on Neural Information Processing Systems, vol 14, p 841Google Scholar
  20. 20.
    Pianesi F, Zancanaro M, Lepri B, Cappelletti A (2007) A multimodal annotated corpus of consensus decision making meetings. Lang Resour Eval 41(3-4):409–429CrossRefGoogle Scholar
  21. 21.
    Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRefGoogle Scholar
  22. 22.
    Rosales R, Sclaroff S (2006) Combining generative and discriminative models in a framework for articulated pose estimation. Int J Comput Vis 67(3):251–276CrossRefGoogle Scholar
  23. 23.
    Salamin H, Favre S, Vinciarelli A (2009) Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia 11(7):1373–1380CrossRefGoogle Scholar
  24. 24.
    Salzmann M, Urtasun R (2010) Combining discriminative and generative methods for 3d deformable surface and articulated pose reconstruction. In: Computer Vision and Pattern Recognition, pp 647–654Google Scholar
  25. 25.
    Sanchez-Cortes D, Aran O, Gatica-Perez D (2011) “An audio visual corpus for emergent leader analysis. In: Multimodal CorporaGoogle Scholar
  26. 26.
    Sanchez-Cortes D, Aran O, Mast MS, Gatica-Perez D (2012) A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Transactions on Multimedia 14(3):816–832CrossRefGoogle Scholar
  27. 27.
    Thorndike E (1920) Intelligence and its use. Harper’s Magazine 140:227–235Google Scholar
  28. 28.
    Varni G, Volpe G, Camurri A (2010) A system for real-time multimodal analysis of nonverbal affective social interaction in user-centric media. IEEE Transactions on Multimedia 12(6):576–590CrossRefGoogle Scholar
  29. 29.
    Vinciarelli A (2007) Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Transactions on Multimedia 9(6):1215–1226CrossRefGoogle Scholar
  30. 30.
    Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: Survey of an emerging domain. Image Vis Comput 27(12):1743–1759CrossRefGoogle Scholar
  31. 31.
    Wasserman S (1994) Social network analysis: Methods and applications, vol 8. Cambridge University PressGoogle Scholar
  32. 32.
    Zancanaro M, lepri B, Pianesi F (2006) Automatic detection of group functional roles in face to face interactions. In: International Conference on Multimedia Interaction, pp 28–34Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Sorbonne Universités, UPMC Univ Paris 06 CNRS, UMR 7222ParisFrance

Personalised recommendations