Personal and Ubiquitous Computing

, Volume 14, Issue 8, pp 695–702 | Cite as

Multimodal sensing, recognizing and browsing group social dynamics

  • Zhiwen YuEmail author
  • Zhiyong Yu
  • Xingshe Zhou
  • Yuichi Nakamura
Original Article


Group social dynamics is crucial for determining whether a meeting was well organized and the conclusion well reasoned. In this paper, we propose multimodal approaches for sensing, recognition and browsing of social dynamics, specifically human semantic interactions and group interests in small group meetings. Unlike physical interactions (e.g., turn-taking and addressing), the human interactions considered here are incorporated with semantics, i.e., user intention or attitude toward a topic. Group interests are defined as episodes in which participants engaged in an emphatic and heated discussion. We adopt multiple sensors, such as video cameras, microphones and motion sensors for meeting capture. Multimodal methods are proposed for human interaction recognition and group interest recognition based on a variety of features. A graphical user interface, the MMBrowser, is presented for browsing group social dynamics. Experimental results have demonstrated the feasibility of the proposed approaches.


Group social dynamics Smart meeting Multimodal Human interaction Group interest 



This work was partially supported by the National Natural Science Foundation of China (No. 60903125), the Program for New Century Excellent Talents in University, the High-Tech Program of China (863) (No. 2009AA011903), the Doctorate Foundation of Northwestern Polytechnical University of China (No. CX200814), and the Ministry of Education, Culture, Sports, Science and Technology, Japan under the project of “Cyber Infrastructure for the Information-explosion Era”.


  1. 1.
    DiMicco JM et al (2006) Using visualizations to review a group’s interaction dynamics. Ext. Abstracts CHI, ACM Press, pp 706–711Google Scholar
  2. 2.
    Stiefelhagen R, Chen X, Yang J (2005) Capturing interactions in meetings with omnidirectional cameras. Int J Distance Edu Technol 3(3):34–47Google Scholar
  3. 3.
    Yu H, Finke M, Waibel A (1999) Progress in automatic meeting transcription. In: Proceedings of 6th European Conference on Speech Communication and Technology (Eurospeech-99), September 1999, 2: 695–698Google Scholar
  4. 4.
    Bett M, GrossR, Yu H, Zhu X, Pan Y, Yang J, Waibel A (2000) Multimodal meeting Tracker. In Proceedings of international conference on Content-Based Multimedia Information Access (RIAO 2000), pp 32–45Google Scholar
  5. 5.
    Nijholt A, Rienks RJ, Zwiers J, Reidsma D (2006) Online and off-line visualization of meeting information and meeting support. Vis Comput 22(12):965–976CrossRefGoogle Scholar
  6. 6.
    Sumi Y et al (2007) Collaborative capturing, interpreting, and sharing of experiences. Pers Ubiquit Comput 11(4):265–271CrossRefGoogle Scholar
  7. 7.
    Otsuka K, Sawada H, Yamato J (2007) Automatic inference of cross-modal nonverbal interactions in multiparty Conversations. In Proceedings of ICMI 2007, pp 255–262Google Scholar
  8. 8.
    Pianesi F, Zancanaro M, Not E, Leonardi C, Falcon V, Lepri B (2008) Multimodal support to group dynamics. Pers Ubiquitous Comput 12(3):181–195CrossRefGoogle Scholar
  9. 9.
    Sturm J, Herwijnen OH, Eyck A, Terken J (2007) Influencing Social Dynamics in meetings through a peripheral display. In Proceedings of ICMI 2007, pp 263–270Google Scholar
  10. 10.
    Jaimes A, Nagamine T, Liu J, Omura K, Sebe N (2005) Affective meeting video analysis. In IEEE International Conference on Multimedia and Expo, July 2005, pp 1412–1415Google Scholar
  11. 11.
    Wrede B, Shriberg E (2003) Spotting hotspots in meetings: Human judgments and prosodic cues. In Proceedings of Eurospeech 2003, Geneva, pp 2805–2808Google Scholar
  12. 12.
    Gatica-Perez D, McCowan I, Zhang D, Bengio S (2005) Detecting group interest-level in meetings. In Proceedings of IEEE ICASSP 2005, March 18–23, vol. 1, pp 489–492Google Scholar
  13. 13.
    Hillard D, Ostendorf M, Shriberg E (2003) Detection of agreement vs. disagreement in meetings: training with unlabeled data. In Proceedings of HLT-NAACL 2003, pp 34–36Google Scholar
  14. 14.
    Tomobe H, Nagao K (2006) Discussion ontology: knowledge discovery from human activities in meetings. In Proceedings of JSAI 2006, pp 33–41Google Scholar
  15. 15.
    Rienks R, Zhang D, Gatica-Perez D, Post W (2006) Detection and application of influence rankings in small group meetings. In Proceedings of ICMI 2006, pp 257–264Google Scholar
  16. 16.
    Banerjee S, Rudnicky AI (2004) Using simple speech based features to detect the state of a meeting and the roles of the meeting participants. In Proceedings of the 8th international conference on Spoken Language Processing (Interspeech 2004), October 2004Google Scholar
  17. 17.
    Zancanaro M, Lepri B, Pianesi F (2006) Automatic detection of group functional roles in face to face interactions. In Proceedings of ICMI 2006, pp 28–34Google Scholar
  18. 18.
    Brdiczka O, Maisonnasse J, Reignier P (2005) Automatic detection of interaction groups. In Proceedings of ICMI 2005, pp 32–36Google Scholar
  19. 19.
    Truong KN, Abowd GD (2004) INCA: a software infrastructure to facilitate the construction and evolution of ubiquitous capture and access applications. In Proceedings of PERVASIVE 2004, pp 140–157Google Scholar
  20. 20.
    Abowd GD (1999) Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Syst J 38(4):508–530CrossRefGoogle Scholar
  21. 21.
    Truong KN, Abowd GD, Brotherton JA Personalizing the capture of public experiences. In Proceedings of UIST’99, pp 121–130Google Scholar
  22. 22.
    Dey AK, Salber D, Abowd GD, Futakawa M The conference assistant: combining context-awareness with wearable computing. In Proceedings of ISWC’99, 21–28Google Scholar
  23. 23.
    Richter H, Abowd GD, Geyer W, Fuchs L, Daijavad S, Poltrock S (2001) Integrating meeting capture within a collaborative team environment. In Proceedings of Ubicomp 2001, pp 123–138Google Scholar
  24. 24.
    Mikic I, Huang K, Trivedi M (2000) Activity monitoring and summarization for an intelligent meeting room. IEEE Workshop on Human Motion, 107–112Google Scholar
  25. 25.
    Chiu P, Kapuskar A, Reitmeier S, Wilcox L (2000) Room with a rear view: meeting capture in a multimedia conference room. IEEE Multimed 7(4):48–54CrossRefGoogle Scholar
  26. 26.
    Fleck M, Frid M, Kindberg T, O’Brien-Strain E, Rajani R, Spasojevic M (2002) Rememberer: a tool for capturing museum visits. In Proceedings of UBICOMP 2002, pp 48–55Google Scholar
  27. 27.
    Phase Space IMPULSE system,
  28. 28.
    Yu ZW, Yu ZY, Aoyama H, Ozeki Z, Nakamura Y (2010) Capture, Recognition, and Visualization of Human Semantic Interactions in Meetings, In Proceedings of the 8th IEEE international conference on Pervasive Computing and Communications (PerCom 2010), March 29–April 2, 2010, MannheimGoogle Scholar
  29. 29.
    Julius speech recognition engine,
  30. 30.
    Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRefGoogle Scholar
  31. 31.
    Vapnik VN (1995) The nature of statistical learning theory. Springer Verlag, HeidelbergzbMATHGoogle Scholar
  32. 32.
    Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogues. Lang Speech 41:295–321Google Scholar
  33. 33.
    Hackman JR (1990) Groups that work (and those that don’t). Jossey-Bass, San FranciscoGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  • Zhiwen Yu
    • 1
    Email author
  • Zhiyong Yu
    • 1
  • Xingshe Zhou
    • 1
  • Yuichi Nakamura
    • 2
  1. 1.School of Computer ScienceNorthwestern Polytechnical UniversityXi’an, ShaanxiChina
  2. 2.Academic Center for Computing and Media StudiesKyoto UniversityKyotoJapan

Personalised recommendations