Abstract
Laughter is a key element of human-human interaction, occurring surprisingly frequently in multi-party conversation. In meetings, laughter accounts for almost 10% of vocalization effort by time, and is known to be relevant for topic segmentation and the automatic characterization of affect. We present a system for the detection of laughter, and its attribution to specific participants, which relies on simultaneously decoding the vocal activity of all participants given multi-channel recordings. The proposed framework allows us to disambiguate laughter and speech not only acoustically, but also by constraining the number of simultaneous speakers and the number of simultaneous laughers independently, since participants tend to take turns speaking but laugh together. We present experiments on 57 hours of meeting data, containing almost 11000 unique instances of laughter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Laskowski, K., Burger, S.: Analysis of the occurrence of laughter in meetings. In: Proc. INTERSPEECH, Antwerpen, Belgium, pp. 1258–1261 (2007)
Kennedy, L., Ellis, D.: Laughter detection in meetings. In: Proc. ICASSP Meeting Recognition Workshop, Montreal, Canada, NIST, pp. 118–121 (2004)
Russell, J., Bachorowski, J.A., Fernandez-Dols, J.M.: Facial and vocal expressions of emotion. Annual Review of Psychology 54, 329–349 (2003)
Laskowski, K., Burger, S.: Annotation and analysis of emotionally relevant behavior in the ISL Meeting Corpus. In: Proc. LREC, Genoa, Italy (2006)
Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, pp. 562–569. Springer, Heidelberg (2004)
Banerjee, S., Rose, C., Rudnicky, A.: The necessity of a meeting recording and playback system, and the benefit of topic-level annotations to meeting browsing. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 643–656. Springer, Heidelberg (2005)
Wrede, B., Shriberg, E.: Spotting “hotspots” in meetings: Human judgments and prosodic cues. In: Proc. EUROSPEECH, Geneva, Switzerland, pp. 2805–2808 (2003)
Truong, K., van Leeuwen, D.: Automatic detection of laughter. In: Proc. INTERSPEECH, Lisbon, Portugal, pp. 485–488 (2005)
Truong, K., van Leeuwen, D.: Automatic discrimination between laughter and speech. Speech Communication 49(2), 144–158 (2007)
Knox, M., Mirghafori, N.: Automatic laughter detection using neural networks. In: Proc. INTERSPEECH, Antwerpen, Belgium, pp. 2973–2976 (2007)
Truong, K., van Leeuwen, D.: Evaluating automatic laughter segmentation in meetings using acoustic and acoustics-phonetic features. In: Proc. ICPhS Workshop on The Phonetics of Laughter, Saarbrücken, Germany, pp. 49–53 (2007)
Pfau, T., Ellis, D., Stolcke, A.: Multispeaker speech activity detection for the ICSI Meeting Recorder. In: Proc. ASRU, Madonna di Campiglio, Italy, pp. 107–110 (2001)
Janin, A., et al.: The ICSI Meeting Corpus. In: Proc. ICASSP, Hong Kong, China, pp. 364–367 (2003)
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: Proc. SIGdial, Cambridge MA, USA, pp. 97–100 (2004)
Norwine, A.C., Murphy, O.J.: Characteristic time intervals in telephonic conversation. Bell System Technical Journal 17, 281–291 (1938)
Fiscus, J., Ajot, J., Michel, M., Garofolo, J.: The Rich Transcription 2006 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 309–322. Springer, Heidelberg (2006)
Bachorowski, J.-A., Smoski, M., Owren, M.: The acoustic features of human laughter. J. of Acoustical Society of America 110(3), 1581–1597 (2001)
Laskowski, K., Burger, S.: On the correlation between perceptual and contextual aspects of laughter in meetings. In: Proc. ICPhS Workshop on the Phonetics of Laughter, Saarbrücken, Germany (2007)
Nwokah, E., Hsu, H.-C., Davies, P., Fogel, A.: The integration of laughter and speech in vocal communication: A dynamic systems perspective. J. of Speech, Language & Hearing Research 42, 880–894 (1999)
Laskowski, K., Schultz, T.: A supervised factorial acoustic model for simultaneous multiparticipant vocal activity detection in close-talk microphone recordings of meetings. Technical Report CMU-LTI-07-017, Carnegie Mellon University, Pittsburgh PA, USA (December 2007)
Wrigley, S., Brown, G., Wan, V., Renals, S.: Speech and crosstalk detection in multichannel audio. IEEE Trans. Speech and Audio Proc. 13(1), 84–91 (2005)
Huang, Z., Harper, M.: Speech activity detection on multichannels of meetings recordings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 415–427. Springer, Heidelberg (2006)
Boakye, K., Stolcke, A.: Improved speech activity detection using cross-channel features for recognition of multiparty meetings. In: Proc. INTERSPEECH, Pittsburgh PA, USA, pp. 1962–1965 (2006)
Laskowski, K., Schultz, T.: Modeling duration contraints for simultaneous multiparticipant vocal activity detection in meetings. Technical report, Carnegie Mellon University, Pittsburgh PA, USA, (February 2008)
Laskowski, K., Fügen, C., Schultz, T.: Simultaneous multispeaker segmentation for automatic meeting recognition. In: Proc. EUSIPCO, Poznań, Poland, pp. 1294–1298 (2007)
Wrigley, S., Brown, G., Wan, V., Renals, S.: Feature selection for the classification of crosstalk in multi-channel audio. In: Proc. EUROSPEECH, Geneva, Switzerland, pp. 469–472 (2003)
Dines, J., Vepa, J., Hain, T.: The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Proc. INTERSPEECH, Pittsburgh PA, USA, pp. 1213–1216 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laskowski, K., Schultz, T. (2008). Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-85853-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85852-2
Online ISBN: 978-3-540-85853-9
eBook Packages: Computer ScienceComputer Science (R0)