Group Dynamics and Multimodal Interaction Modeling Using a Smart Digital Signage

  • Tony Tung
  • Randy Gomez
  • Tatsuya Kawahara
  • Takashi Matsuyama
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7583)


This paper presents a new multimodal system for group dynamics and interaction analysis. The framework is composed of a mic array and multiview video cameras placed on a digital signage display which serves as a support for interaction. We show that visual information processing can be used to localize nonverbal communication events and synchronized with audio information. Our contribution is twofold: 1) we present a scalable portable system for multiple people multimodal interaction sensing, and 2) we propose a general framework to model A/V multimodal interaction that employs speaker diarization for audio processing and hybrid dynamical systems (HDS) for video processing. HDS are used to represent communication dynamics between multiple people by capturing the characteristics of temporal structures in head motions. Experimental results show real-world situations of group communication processing for joint attention estimation. We believe the proposed framework is very promising for further research.


Gaussian Mixture Model Joint Attention Linear Dynamical System Visual Information Processing Microphone Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Newcomb, T.M., Turner, R.H., Converse, P.E.: Social psychology: The study of human interaction. Routledge and Kegan Paul (1966)Google Scholar
  2. 2.
    Cassell, J., Vilhjálmsson, H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: SIGGRAPH (2001)Google Scholar
  3. 3.
    Buchanan, M.: Secret signals. Nature (2009)Google Scholar
  4. 4.
    Pentland, A.: To signal is human. American Scientist (2010)Google Scholar
  5. 5.
    Chen, L., Rose, R., Qiao, Y., Kimbara, I., Parrill, F., Welji, H., Han, T., Tu, J., Huang, Z., Harper, M., Quek, F., Xiong, Y., McNeill, D., Tuttle, R., Huang, T.: Vace multimodal meeting corpus (2006)Google Scholar
  6. 6.
    Poel, M., Poppe, R., Nijholt, A.: Meeting behavior detection in smart environments: Nonverbal cues that help to obtain natural interaction. In: FG (2008)Google Scholar
  7. 7.
    Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of concensus decision making meetings. In: Language Resources and Evaluation, pp. 409–429 (2007)Google Scholar
  8. 8.
    Sumi, Y., Yano, M., Nishida, T.: Analysis environment of conversational structure with nonverbal multimodal data. In: ICMI-MLMI (2010)Google Scholar
  9. 9.
    White, S.: Backchannels across cultures: A study of americans and japanese. Language in Society 18, 59–76 (1989)CrossRefGoogle Scholar
  10. 10.
    Rabiner, L.R.: A tutorial on hidden markow models and selected applications in speech recognition. IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  11. 11.
    Liu, C.D., Chung, Y.N., Chung, P.C.: An interaction-embedded hmm framework for human behavior understanding: With nursing environments as examples. IEEE Trans. Information Technology in Biomedecine 14, 1236–1246 (2010)CrossRefGoogle Scholar
  12. 12.
    Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. IJCV 51 (2003)Google Scholar
  13. 13.
    Kawashima, H., Matsuyama, T.: Interval-based modeling of human communication dynamics via hybrid dynamical systems. In: NIPS Workshop on Modeling Human Communication Dynamics (2010)Google Scholar
  14. 14.
    Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: CVPR (2009)Google Scholar
  15. 15.
    Jani, E., Heracleus, P., Ishi, C., Nagita, N.: Joint use of microphone array and laser range finders for speaker identification in meeting. Japanese Society for Artificial Intelligence (2011)Google Scholar
  16. 16.
    Gomez, R., Lee, A., Saruwatari, H., Shikano, K.: Robust speech recognition with spectral subtraction in low snr. In: Int’l Conf. Spoken Language Processing (2004)Google Scholar
  17. 17.
    Viola, P., Jones, M.: Robust real-time object detection. IJCV (2001)Google Scholar
  18. 18.
    Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-Based Probabilistic Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  19. 19.
    Gomez, R., Kawahara, T.: Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood. IEEE Trans. Audio, Speech and Language Processing (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tony Tung
    • 1
  • Randy Gomez
    • 1
  • Tatsuya Kawahara
    • 1
  • Takashi Matsuyama
    • 1
  1. 1.Academic Center for Computing and Media Studies, and Graduate School of InformaticsKyoto UniversityJapan

Personalised recommendations