Abstract
This paper presents a new multimodal system for group dynamics and interaction analysis. The framework is composed of a mic array and multiview video cameras placed on a digital signage display which serves as a support for interaction. We show that visual information processing can be used to localize nonverbal communication events and synchronized with audio information. Our contribution is twofold: 1) we present a scalable portable system for multiple people multimodal interaction sensing, and 2) we propose a general framework to model A/V multimodal interaction that employs speaker diarization for audio processing and hybrid dynamical systems (HDS) for video processing. HDS are used to represent communication dynamics between multiple people by capturing the characteristics of temporal structures in head motions. Experimental results show real-world situations of group communication processing for joint attention estimation. We believe the proposed framework is very promising for further research.
Chapter PDF
References
Newcomb, T.M., Turner, R.H., Converse, P.E.: Social psychology: The study of human interaction. Routledge and Kegan Paul (1966)
Cassell, J., Vilhjálmsson, H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: SIGGRAPH (2001)
Buchanan, M.: Secret signals. Nature (2009)
Pentland, A.: To signal is human. American Scientist (2010)
Chen, L., Rose, R., Qiao, Y., Kimbara, I., Parrill, F., Welji, H., Han, T., Tu, J., Huang, Z., Harper, M., Quek, F., Xiong, Y., McNeill, D., Tuttle, R., Huang, T.: Vace multimodal meeting corpus (2006)
Poel, M., Poppe, R., Nijholt, A.: Meeting behavior detection in smart environments: Nonverbal cues that help to obtain natural interaction. In: FG (2008)
Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of concensus decision making meetings. In: Language Resources and Evaluation, pp. 409–429 (2007)
Sumi, Y., Yano, M., Nishida, T.: Analysis environment of conversational structure with nonverbal multimodal data. In: ICMI-MLMI (2010)
White, S.: Backchannels across cultures: A study of americans and japanese. Language in Society 18, 59–76 (1989)
Rabiner, L.R.: A tutorial on hidden markow models and selected applications in speech recognition. IEEE 77, 257–286 (1989)
Liu, C.D., Chung, Y.N., Chung, P.C.: An interaction-embedded hmm framework for human behavior understanding: With nursing environments as examples. IEEE Trans. Information Technology in Biomedecine 14, 1236–1246 (2010)
Doretto, G., Chiuso, A., Wu, Y., Soatto, S.: Dynamic textures. IJCV 51 (2003)
Kawashima, H., Matsuyama, T.: Interval-based modeling of human communication dynamics via hybrid dynamical systems. In: NIPS Workshop on Modeling Human Communication Dynamics (2010)
Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: CVPR (2009)
Jani, E., Heracleus, P., Ishi, C., Nagita, N.: Joint use of microphone array and laser range finders for speaker identification in meeting. Japanese Society for Artificial Intelligence (2011)
Gomez, R., Lee, A., Saruwatari, H., Shikano, K.: Robust speech recognition with spectral subtraction in low snr. In: Int’l Conf. Spoken Language Processing (2004)
Viola, P., Jones, M.: Robust real-time object detection. IJCV (2001)
Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-Based Probabilistic Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002)
Gomez, R., Kawahara, T.: Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood. IEEE Trans. Audio, Speech and Language Processing (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tung, T., Gomez, R., Kawahara, T., Matsuyama, T. (2012). Group Dynamics and Multimodal Interaction Modeling Using a Smart Digital Signage. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33863-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-33863-2_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33862-5
Online ISBN: 978-3-642-33863-2
eBook Packages: Computer ScienceComputer Science (R0)