Abstract
Conversations in poster sessions in academic events, referred to as poster conversations, pose interesting and challenging topics on multi-modal multi-party interactions. This article gives an overview of our CREST project on the smart posterboard for multi-modal conversation analysis. The smart posterboard has multiple sensing devices to record poster conversations, so we can review who came to the poster and what kind of questions or comments he/she made. The conversation analysis combines speech and image processing such as face and eye-gaze tracking, speech enhancement and speaker diarization. It is shown that eye-gaze information is useful for predicting turn-taking and also improving speaker diarization. Moreover, high-level indexing of interest and comprehension level of the audience is explored based on the multi-modal behaviors during the conversation. This is realized by predicting the audience’s speech acts such as questions and reactive tokens.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We used different Japanese wording for interest and for surprise to enhance the reliability of the evaluation; we adopt the result if the two matches.
- 2.
This does not mean the presenter actually answered simply by “Yes” or “No”.
References
S. Renals, T. Hain, H. Bourlard, Recognition and understanding of meetings: The AMI and AMIDA projects. Proceedings of IEEE Workshop Automatic Speech Recognition & Understanding (2007)
K. Ohtsuka, Conversation scene analysis. Signal Process. Magaz. 28(4), 127–131 (2011)
T. Kawahara, Multi-modal sensing and analysis of poster conversations toward smart posterboard. In Proceedings of SIGdial Meeting Discourse and Dialogue, pp. 1–9 (keynote speech) (2012)
T. Kawahara, Smart posterboard: Multi-modal sensing and analysis of poster conversations. In Proceedings of APSIPA ASC, page (plenary overview talk) (2013)
T. Kawahara, H.Setoguchi, K. Takanashi, K.Ishizuka, S. Araki, Multi-modal recording, analysis and indexing of poster sessions. Proceedings of INTERSPEECH, pp. 1622–1625 (2008)
K. Maekawa, Corpus of spontaneous Japanese: its design and evaluation. Proceedings of ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, pp. 7–12 (2003)
H. Yoshimoto, Y. Nakamura, Cubistic representation for real-time 3D shape and pose estimation of unknown rigid object. Proceedings ICCV, Workshop, pp. 522–529 (2013)
Y. Takahashi, T. Takatani, K. Osako, H. Saruwatari, K. Shikano, Blind spatial subtraction array for speech enhancement in noisy environment. IEEE Trans. Audio, Speech Language Process. 17(4), 650–664 (2009)
T. Ohsuga, M. Nishida, Y. Horiuchi, A. Ichikawa, Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue. Proceedings INTERSPEECH, pp. 33–36 (2005)
C.T. Ishi, H. Ishiguro, N. Hagita, Analysis of prosodic and linguistic cues of phrase finals for turn-taking and dialog acts. Proceedings of INTERSPEECH, pp. 2006–2009 (2006)
N.G. Ward, Y.A. Bayyari, A case study in the identification of prosodic cues to turn-taking: back-channeling in Arabic. Proceedings of INTERSPEECH, pp. 2018–2021 (2006)
B. Xiao, V. Rozgic, A. Katsamanis, B.R. Baucom, P.G. Georgiou, S. Narayanan, Acoustic and visual cues of turn-taking dynamics in dyadic interactions. Proceedings of INTERSPEECH, pp. 2441–2444 (2011)
R. Sato, R. Higashinaka, M. Tamoto, M. Nakano, K. Aikawa, Learning decision trees to determine turn-taking by spoken dialogue systems. Proceedings of ICSLP, pp. 861–864 (2002)
D. Schlangen, From reaction to prediction: experiments with computational models of turn-taking. Proceedings INTERSPEECH, pp. 2010–2013 (2006)
A. Raux, M. Eskenazi, A finite-state turn-taking model for spoken dialog systems. Proceedings of HLT/NAACL (2009)
N.G. Ward, O. Fuentes, A. Vega, Dialog prediction for a general model of turn-taking. Proceedings of INTERSPEECH, pp. 2662–2665 (2010)
S. Benus, Are we ’in sync’: turn-taking in collaborative dialogues. Proceedings of INTERSPEECH, pp. 2167–2170 (2009)
N. Campbell, S. Scherer, Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. Proceedings of INTERSPEECH, pp. 2546–2549 (2010)
D. Bohus, E. Horvitz, Models for multiparty engagement in open-world dialog. Proceedings of SIGdial (2009)
S. Fujie, Y. Matsuyama, H. Taniyama, T. Kobayashi, Conversation robot participating in and activating a group communication. Proceedings of INTERSPEECH, pp. 264–267 (2009)
K. Laskowski, J. Edlund, M. Heldner, A single-port non-parametric model of turn-taking in multi-party conversation. Proceedings of ICASSP, pp. 5600–5603 (2011)
K. Jokinen, K. Harada, M. Nishida, S. Yamamoto, Turn-alignment using eye-gaze and speech in conversational interaction. Proceedings of InterSpeech, pp. 2018–2021 (2011)
A. Kendon, Some functions of gaze direction in social interaction. Acta Psychol. 26, 22–63 (1967)
S.E. Tranter, D.A. Reynolds, An overview of automatic speaker diarization systems. IEEE Trans. ASLP 14(5), 1557–1565 (2006)
G. Friedland, A. Janin, D. Imseng, X. Anguera Miro, L. Gottlieb, M. Huijbregts, M.T. Knox, O. Vinyals, The ICSI RT-09 speaker diarization system. IEEE Trans. ASLP 20(2), 371–381 (2012)
R. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)
K. Yamamoto, F. Asano, T. Yamada, N. Kitawaki, Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans. E89-A(8), 2158–2165 (2006)
H. Misra, H. Bourlard, V. Tyagi, New entropy based combination rules in hmm/ann multi-stream asr. Proc. ICASSP 2, 741–744 (2003)
S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino, A DOA based speaker diarization system for real meetings. Prooceedings of HSCMA, pp. 29–32 (2008)
Y. Wakabayashi, K. Inoue, H. Yoshimoto, T. Kawahara, Speaker diarization based on audio-visual integration for smart posterboard. Proceedings of APSIPA ASC (2014)
J.G. Fiscus, J. Ajot, M. Michel, J.S. Garofolo, The Rich Transcription 2006 Spring Meeting Recognition Evaluation (Springer, 2006)
H. Koiso, Y. Horiuchi, S. Tutiya, A. Ichikawa, Y. Den, An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language & Speech 41(3–4), 295–321 (1998)
N. Ward, W. Tsukahara, Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics 32(8), 1177–1207 (2000)
N. Kitaoka, M. Takeuchi, R. Nishimura, S. Nakagawa, Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. J. Japn. Soc. Artific. Intell. 20(3), 220–228 (2005)
D. Ozkan, L.-P. Morency, Modeling wisdom of crowds using latent mixture of discriminative experts. Proceedings of ACL/HLT (2011)
L.S.Kennedy, D.P.W. Ellis, Laughter detection in meetings. NIST Meeting Recognition Workshop (2004)
K.P. Truong, D.A. van Leeuwen, Automatic detection of laughter. Proceedings InterSpeech, pp. 485–488 (2005)
K.Laskowski, Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. Proceedings of IEEE-ICASSP, pp. 4765–4768 (2009)
N. Ward, Pragmatic functions of prosodic features in non-lexical utterances. Speech Prosody, pp. 325–328 (2004)
F. Yang, G. Tur, E. Shriberg, Exploiting dialog act tagging and prosodic information for action item identification. Proceedings of IEEE-ICASSP, pp. 4941–4944 (2008)
A. Gravano, S. Benus, J. Hirschberg, S. Mitchell, I. Vovsha, Classification of discourse functions of affirmative words in spoken dialogue. Proceedings of InterSpeech, pp. 1613–1616 (2007)
K. Sumi, T. Kawahara, J. Ogata, M. Goto, Acoustic event detection for spotting hot spots in podcasts. Proceedings of INTERSPEECH, pp. 1143–1146 (2009)
M. Goto, K. Itou, S. Hayamizu, A real-time filled pause detection system for spontaneous speech recognition research. Proceedings of EuroSpeech, pp. 227–230 (1999)
T. Kawahara, Z.Q. Chang, K. Takanashi, Analysis on prosodic features of Japanese reactive tokens in poster conversations. Proceedings Int’l Conference Speech Prosody (2010)
S. Strombergsson, J. Edlund, D. House, Prosodic measurements and question types in the spontal corpus of Swedish dialogues. Proceedings of InterSpeech (2012)
Acknowledgments
This work was conducted by the members of the CREST project including Hiromasa Yoshimoto, Tony Tung, Yukoh Wakabayashi, Kouhei Sumi, Zhi-Qiang Chang, Takuma Iwatate, Soichiro Hayashi, Koji Inoue, Katsuya Takanashi (Kyoto University) and Yuji Onuma, Shunsuke Nakai, Ryoichi Miyazaki, Hiroshi Saruwatari (Nara Institute of Science and Technology).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Japan
About this chapter
Cite this chapter
Kawahara, T. (2016). Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations. In: Nishida, T. (eds) Human-Harmonized Information Technology, Volume 1. Springer, Tokyo. https://doi.org/10.1007/978-4-431-55867-5_9
Download citation
DOI: https://doi.org/10.1007/978-4-431-55867-5_9
Published:
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-55865-1
Online ISBN: 978-4-431-55867-5
eBook Packages: Computer ScienceComputer Science (R0)