Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations

Kawahara, Tatsuya

doi:10.1007/978-4-431-55867-5_9

Tatsuya Kawahara²

601 Accesses
3 Citations

Abstract

Conversations in poster sessions in academic events, referred to as poster conversations, pose interesting and challenging topics on multi-modal multi-party interactions. This article gives an overview of our CREST project on the smart posterboard for multi-modal conversation analysis. The smart posterboard has multiple sensing devices to record poster conversations, so we can review who came to the poster and what kind of questions or comments he/she made. The conversation analysis combines speech and image processing such as face and eye-gaze tracking, speech enhancement and speaker diarization. It is shown that eye-gaze information is useful for predicting turn-taking and also improving speaker diarization. Moreover, high-level indexing of interest and comprehension level of the audience is explored based on the multi-modal behaviors during the conversation. This is realized by predicting the audience’s speech acts such as questions and reactive tokens.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Survey on Multimodal Emotion Recognition (MER) Systems

Multimodal Techniques and Methods in Affective Computing – A Brief Overview

Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

Notes

1.
We used different Japanese wording for interest and for surprise to enhance the reliability of the evaluation; we adopt the result if the two matches.
2.
This does not mean the presenter actually answered simply by “Yes” or “No”.

References

S. Renals, T. Hain, H. Bourlard, Recognition and understanding of meetings: The AMI and AMIDA projects. Proceedings of IEEE Workshop Automatic Speech Recognition & Understanding (2007)
Google Scholar
K. Ohtsuka, Conversation scene analysis. Signal Process. Magaz. 28(4), 127–131 (2011)
Google Scholar
T. Kawahara, Multi-modal sensing and analysis of poster conversations toward smart posterboard. In Proceedings of SIGdial Meeting Discourse and Dialogue, pp. 1–9 (keynote speech) (2012)
Google Scholar
T. Kawahara, Smart posterboard: Multi-modal sensing and analysis of poster conversations. In Proceedings of APSIPA ASC, page (plenary overview talk) (2013)
Google Scholar
T. Kawahara, H.Setoguchi, K. Takanashi, K.Ishizuka, S. Araki, Multi-modal recording, analysis and indexing of poster sessions. Proceedings of INTERSPEECH, pp. 1622–1625 (2008)
Google Scholar
K. Maekawa, Corpus of spontaneous Japanese: its design and evaluation. Proceedings of ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, pp. 7–12 (2003)
Google Scholar
H. Yoshimoto, Y. Nakamura, Cubistic representation for real-time 3D shape and pose estimation of unknown rigid object. Proceedings ICCV, Workshop, pp. 522–529 (2013)
Google Scholar
Y. Takahashi, T. Takatani, K. Osako, H. Saruwatari, K. Shikano, Blind spatial subtraction array for speech enhancement in noisy environment. IEEE Trans. Audio, Speech Language Process. 17(4), 650–664 (2009)
Google Scholar
T. Ohsuga, M. Nishida, Y. Horiuchi, A. Ichikawa, Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue. Proceedings INTERSPEECH, pp. 33–36 (2005)
Google Scholar
C.T. Ishi, H. Ishiguro, N. Hagita, Analysis of prosodic and linguistic cues of phrase finals for turn-taking and dialog acts. Proceedings of INTERSPEECH, pp. 2006–2009 (2006)
Google Scholar
N.G. Ward, Y.A. Bayyari, A case study in the identification of prosodic cues to turn-taking: back-channeling in Arabic. Proceedings of INTERSPEECH, pp. 2018–2021 (2006)
Google Scholar
B. Xiao, V. Rozgic, A. Katsamanis, B.R. Baucom, P.G. Georgiou, S. Narayanan, Acoustic and visual cues of turn-taking dynamics in dyadic interactions. Proceedings of INTERSPEECH, pp. 2441–2444 (2011)
Google Scholar
R. Sato, R. Higashinaka, M. Tamoto, M. Nakano, K. Aikawa, Learning decision trees to determine turn-taking by spoken dialogue systems. Proceedings of ICSLP, pp. 861–864 (2002)
Google Scholar
D. Schlangen, From reaction to prediction: experiments with computational models of turn-taking. Proceedings INTERSPEECH, pp. 2010–2013 (2006)
Google Scholar
A. Raux, M. Eskenazi, A finite-state turn-taking model for spoken dialog systems. Proceedings of HLT/NAACL (2009)
Google Scholar
N.G. Ward, O. Fuentes, A. Vega, Dialog prediction for a general model of turn-taking. Proceedings of INTERSPEECH, pp. 2662–2665 (2010)
Google Scholar
S. Benus, Are we ’in sync’: turn-taking in collaborative dialogues. Proceedings of INTERSPEECH, pp. 2167–2170 (2009)
Google Scholar
N. Campbell, S. Scherer, Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. Proceedings of INTERSPEECH, pp. 2546–2549 (2010)
Google Scholar
D. Bohus, E. Horvitz, Models for multiparty engagement in open-world dialog. Proceedings of SIGdial (2009)
Google Scholar
S. Fujie, Y. Matsuyama, H. Taniyama, T. Kobayashi, Conversation robot participating in and activating a group communication. Proceedings of INTERSPEECH, pp. 264–267 (2009)
Google Scholar
K. Laskowski, J. Edlund, M. Heldner, A single-port non-parametric model of turn-taking in multi-party conversation. Proceedings of ICASSP, pp. 5600–5603 (2011)
Google Scholar
K. Jokinen, K. Harada, M. Nishida, S. Yamamoto, Turn-alignment using eye-gaze and speech in conversational interaction. Proceedings of InterSpeech, pp. 2018–2021 (2011)
Google Scholar
A. Kendon, Some functions of gaze direction in social interaction. Acta Psychol. 26, 22–63 (1967)
Article Google Scholar
S.E. Tranter, D.A. Reynolds, An overview of automatic speaker diarization systems. IEEE Trans. ASLP 14(5), 1557–1565 (2006)
Google Scholar
G. Friedland, A. Janin, D. Imseng, X. Anguera Miro, L. Gottlieb, M. Huijbregts, M.T. Knox, O. Vinyals, The ICSI RT-09 speaker diarization system. IEEE Trans. ASLP 20(2), 371–381 (2012)
Google Scholar
R. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)
Article Google Scholar
K. Yamamoto, F. Asano, T. Yamada, N. Kitawaki, Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans. E89-A(8), 2158–2165 (2006)
Google Scholar
H. Misra, H. Bourlard, V. Tyagi, New entropy based combination rules in hmm/ann multi-stream asr. Proc. ICASSP 2, 741–744 (2003)
Google Scholar
S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino, A DOA based speaker diarization system for real meetings. Prooceedings of HSCMA, pp. 29–32 (2008)
Google Scholar
Y. Wakabayashi, K. Inoue, H. Yoshimoto, T. Kawahara, Speaker diarization based on audio-visual integration for smart posterboard. Proceedings of APSIPA ASC (2014)
Google Scholar
J.G. Fiscus, J. Ajot, M. Michel, J.S. Garofolo, The Rich Transcription 2006 Spring Meeting Recognition Evaluation (Springer, 2006)
Google Scholar
H. Koiso, Y. Horiuchi, S. Tutiya, A. Ichikawa, Y. Den, An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language & Speech 41(3–4), 295–321 (1998)
Google Scholar
N. Ward, W. Tsukahara, Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics 32(8), 1177–1207 (2000)
Article Google Scholar
N. Kitaoka, M. Takeuchi, R. Nishimura, S. Nakagawa, Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. J. Japn. Soc. Artific. Intell. 20(3), 220–228 (2005)
Google Scholar
D. Ozkan, L.-P. Morency, Modeling wisdom of crowds using latent mixture of discriminative experts. Proceedings of ACL/HLT (2011)
Google Scholar
L.S.Kennedy, D.P.W. Ellis, Laughter detection in meetings. NIST Meeting Recognition Workshop (2004)
Google Scholar
K.P. Truong, D.A. van Leeuwen, Automatic detection of laughter. Proceedings InterSpeech, pp. 485–488 (2005)
Google Scholar
K.Laskowski, Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. Proceedings of IEEE-ICASSP, pp. 4765–4768 (2009)
Google Scholar
N. Ward, Pragmatic functions of prosodic features in non-lexical utterances. Speech Prosody, pp. 325–328 (2004)
Google Scholar
F. Yang, G. Tur, E. Shriberg, Exploiting dialog act tagging and prosodic information for action item identification. Proceedings of IEEE-ICASSP, pp. 4941–4944 (2008)
Google Scholar
A. Gravano, S. Benus, J. Hirschberg, S. Mitchell, I. Vovsha, Classification of discourse functions of affirmative words in spoken dialogue. Proceedings of InterSpeech, pp. 1613–1616 (2007)
Google Scholar
K. Sumi, T. Kawahara, J. Ogata, M. Goto, Acoustic event detection for spotting hot spots in podcasts. Proceedings of INTERSPEECH, pp. 1143–1146 (2009)
Google Scholar
M. Goto, K. Itou, S. Hayamizu, A real-time filled pause detection system for spontaneous speech recognition research. Proceedings of EuroSpeech, pp. 227–230 (1999)
Google Scholar
T. Kawahara, Z.Q. Chang, K. Takanashi, Analysis on prosodic features of Japanese reactive tokens in poster conversations. Proceedings Int’l Conference Speech Prosody (2010)
Google Scholar
S. Strombergsson, J. Edlund, D. House, Prosodic measurements and question types in the spontal corpus of Swedish dialogues. Proceedings of InterSpeech (2012)
Google Scholar

Download references

Acknowledgments

This work was conducted by the members of the CREST project including Hiromasa Yoshimoto, Tony Tung, Yukoh Wakabayashi, Kouhei Sumi, Zhi-Qiang Chang, Takuma Iwatate, Soichiro Hayashi, Koji Inoue, Katsuya Takanashi (Kyoto University) and Yuji Onuma, Shunsuke Nakai, Ryoichi Miyazaki, Hiroshi Saruwatari (Nara Institute of Science and Technology).

Author information

Authors and Affiliations

Kyoto University, Sakyo-ku, Kyoto, 606-8501, Japan
Tatsuya Kawahara

Authors

Tatsuya Kawahara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tatsuya Kawahara .

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, Japan
Toyoaki Nishida

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kawahara, T. (2016). Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations. In: Nishida, T. (eds) Human-Harmonized Information Technology, Volume 1. Springer, Tokyo. https://doi.org/10.1007/978-4-431-55867-5_9

Download citation

DOI: https://doi.org/10.1007/978-4-431-55867-5_9
Published: 07 January 2016
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-55865-1
Online ISBN: 978-4-431-55867-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations

Abstract

Access this chapter

Similar content being viewed by others

Survey on Multimodal Emotion Recognition (MER) Systems

Multimodal Techniques and Methods in Affective Computing – A Brief Overview

Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations

Abstract

Access this chapter

Similar content being viewed by others

Survey on Multimodal Emotion Recognition (MER) Systems

Multimodal Techniques and Methods in Affective Computing – A Brief Overview

Estimating Speaker’s Engagement from Non-verbal Features Based on an Active Listening Corpus

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation