Automatic analysis of multiparty meetings

RENALS, STEVE

doi:10.1007/s12046-011-0051-3

Automatic analysis of multiparty meetings

Published: 22 November 2011

Volume 36, pages 917–932, (2011)
Cite this article

Sadhana Aims and scope Submit manuscript

STEVE RENALS¹

89 Accesses
1 Citation
Explore all metrics

Abstract

This paper is about the recognition and interpretation of multiparty meetings captured as audio, video and other signals. This is a challenging task since the meetings consist of spontaneous and conversational interactions between a number of participants: it is a multimodal, multiparty, multistream problem. We discuss the capture and annotation of the Augmented Multiparty Interaction (AMI) meeting corpus, the development of a meeting speech recognition system, and systems for the automatic segmentation, summarization and social processing of meetings, together with some example applications based on these systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Hames M, Dielmann A, Gatica-Perez D, Reiter S, Renals S, Rigoll G, Zhang D 2006 Multimodal integration for meeting group action segmentation and recognition, In S Renals, S Bengio (eds), Proc. MLMI ’05, LNCS, vol. 3869. Berlin, Heidelberg: Springer-Verlag, pp 52–63
Anderson A, Bader M, Bard E, Boyle E, Doherty G, Garrod S, Isard S, Kowtko J, McAllister J, Miller J, et al 1991 The HCRC map task corpus, Lang. Speech 34(4): 351–366
Google Scholar
Ba S O, Odobez J M 2008 Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues, In Proc. IEEE ICASSP, pp 2221–2224
Bahl L, Brown P, de Souza P, Mercer R 1986 Maximum mutual information estimation of hidden markov model parameters for speech recognition, In Proc IEEE ICASSP, pp 49–52
Bales R F 1951 Interaction process analysis (Cambridge MA, USA: Addison Wesley)
Google Scholar
Bulyko I, Ostendorf M, Siu M, Ng T, Stolcke A, Cetin O 2007 Web resources for language modeling in conversational speech recognition, ACM Trans. Speech Lang. Process. 5(1): 1–25
Article Google Scholar
Carletta J, Evert S, Heid U, Kilgour J 2005 The NITE XML toolkit: Data model and query language, Lang. Resour. Evaluation 39(4): 313–334
Article Google Scholar
Carletta J, Evert S, Heid U, Kilgour J, Robertson J, Voormann H 2003 The NITE XML Toolkit: flexible annotation for multimodal language data, Behav. Res. Meth. Instrum. Comput. 35(3): 353–363
Article Google Scholar
Chen S F, Kingsbury B, Mangu L, Povey D, Saon G, Soltau H, Zweig G 2006 Advances in speech transcription at IBM under the DARPA EARS program. IEEE Trans. Audio Speech Lang. Process. 14(5): 1596–1608
Article Google Scholar
Cohen J, Kamm T, Andreou A 1995 Vocal tract normalization in speech recognition: compensating for systematic speaker variability, J. Acoust. Soc. Am. 97(5, Pt. 2): 3246–3247
Article Google Scholar
Cutler R, Rui Y, Gupta A, Cadiz J, Tashev I, He L, Colburn A, Zhang Z, Liu Z, Silverberg S 2002 Distributed meetings: a meeting capture and broadcasting system. In Proc. ACM Multimedia, pp 503–512
Dielmann A, Renals S 2007 Automatic meeting segmentation using dynamic Bayesian networks, IEEE Trans. Multimedia 9(1): 25–36
Article Google Scholar
Digalakis V V, Rtischev D, Neumeyer L G 1995 Speaker adaptation using constrained estimation of Gaussian mixtures, IEEE Trans. Speech Audio Process. 3(5): 357–366
Article Google Scholar
Dines J, Vepa J, Hain T 2006 The segmentation of multi-channel meeting recordings for automatic speech recognition, In Proc. Interspeech, pp 1213–1216
Gales M J F, Kim D Y, Woodland P C, Chan H Y, Mrva D, Sinha R, Tranter S E 2006 Progress in the CU-HTK broadcast news transcription system, IEEE Trans. Audio Speech Lang. Process. 14(5): 1513– 1525
Article Google Scholar
Gales M J F, Young S J 2007 The application of hidden Markov models in speech recognition, Foundations Trends Signal Process. 1(3): 195–304
Article MATH Google Scholar
Garau G, Renals S 2008 Combining spectral representations for large vocabulary continuous speech recognition, IEEE Trans. Audio Speech Lang. Process. 16(3): 508–518
Article Google Scholar
Garner P, Dines J, Hain T, El Hannani A, Karafiat M, Korchagin D, Lincoln M, Wan V, Zhang L 2009 Real-time ASR from meetings, In Proc. Interspeech, pp 2119–2122
Gatica-Perez D, Lathoud G, Odobez J M, McCowan I 2007 Audio-visual probabilistic tracking of multiple speakers in meetings, IEEE Trans. Audio Speech Lang. Process. 15(2): 601–616
Article Google Scholar
Germesin S, Wilson T 2009 Agreement detection in multiparty conversation, In Proc ICMI-MLMI, pp 7–14
Godfrey J J, Holliman E C, McDaniel J 1992 SWITCHBOARD: Telephone speech corpus for research and development, In Proc. IEEE ICASSP, pp 517–520
Grezl F, Karafiat M, Kontar S, Cernocky J 2007 Probabilistic and bottle-neck features for LVCSR of meetings, In Proc IEEE ICASSP, pp IV-757–IV-760
Hain T, Woodland P C, Niesler T R, Whittaker E W D 1999 The 1998 HTK system for transcription of conversational telephone speech, In Proc IEEE ICASSP, pp 57–60
Hain T, Dines J, Garau G, Karafiat M, Moore D, Wan V, Ordelman R, Renals S 2005 Transcription of conference room meetings: an investigation, In Proc. Interspeech ’05, pp 1661–1664
Hain T, Burget L, Dines J, Garau G, Karafiat M, Lincoln M, Vepa J, Wan V 2007 The AMI system for the transcription of speech in meetings, In Proc. IEEE ICASSP, pp IV-357–IV-360
Huang S, Renals S 2008 Unsupervised language model adaptation based on topic and role information in multiparty meetings, In Proc. Interspeech, pp 833–836
Janin A, Baron D, Edwards J, Ellis D, Gelbart D, Morgan N, Peskin B, Pfau T, Shriberg E, Stolcke A, Wooters C 2003 The ICSI meeting corpus, In Proc. IEEE ICASSP, pp I-364–I-367
Kazman R, Al-Halimi R, Hunt W, Mantei M 1996 Four paradigms for indexing video conferences, IEEE Multimedia 3(1): 63–73
Article Google Scholar
Kilgour J, Carletta J, Renals S 2010 The Ambient Spotlight: Queryless desktop search from meeting speech, In Proc ACM Multimedia 2010 Workshop SSCS 2010. doi:10.1145/1878101.1878112
Kumar N, Andreou A G 1998 Heteroscedastic discriminant analysis and reduced rank HMMs for improved recognition, Speech Commun. 26: 283–297
Article Google Scholar
Lee D, Erol B, Graham J 2002 Portable meeting recorder, ACM Multimedia, pp 493–502
Leggetter C J, Woodland P C 1995 Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Comput. Speech Lang. 9(2): 171–185
Article Google Scholar
Lin C, Hovy E 2003 Automatic evaluation of summaries using n-gram co-occurrence statistics, In Proc. NAACL/HLT, pp 71–78
Liu F, Liu Y 2008 Correlation between rouge and human evaluation of extractive meeting summaries, In Proc. ACL, pp 201–204
McGrath J E 1991 Time, interaction, and performance (TIP): A theory of groups, Small Group Res. 22(2): 147
Article MathSciNet Google Scholar
Morgan N, Baron D, Bhagat S, Carvey H, Dhillon R, Edwards J, Gelbart D, Janin A, Krupski A, Peskin B, Pfau T, Shriberg E, Stolcke A, Wooters C 2003 Meetings about meetings: research at ICSI on speech in multiparty conversations, In Proc. IEEE ICASSP, pp IV-740–IV-743
Murray G, Kleinbauer T, Poller P, Becker T, Renals S, Kilgour J 2009 Extrinsic summarization evaluation: A decision audit task, ACM Trans. Speech Lang. Process. 6(2): 1–29
Article Google Scholar
Murray G, Renals S, Moore J, Carletta J 2006 Incorporating speaker and discourse features into speech summarization, In Proc NAACL, pp 367–374
Nadas A 1983 A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood, IEEE Trans. Acoust. Speech Signal Process. 31(4): 814–817
Article Google Scholar
Pentland A 2008 Honest signals: how they shape our world (Cambridge, MA: The MIT Press)
Popescu-Belis A, Boertjes E, Kilgour J, Poller P, Castronovo S, Wilson T, Jaimes A, Carletta J 2008 The AMIDA automatic content linking device: Just-in-time document retrieval in meetings, In Machine Learning for Multimodal Interaction, Lecture Notes in Computer Science, vol. 5237/2008, pp 272–283. Berlin, Heidelberg: Springer, pp 272–283
Poppe R, Poel M 2008 Discriminative human action recognition using pairwise CSP classifiers, In IEEE FGR, pp 1–6
Povey D, Woodland P C 2002 Minimum phone error and i-smoothing for improved discriminative training, In Proc IEEE ICASSP, pp I-105–I-108
Renals S, Hain T 2010 Speech recognition, In A Clark, C Fox, S Lappin (eds), Handbook of computational linguistics and natural language processing (Chichester: Wiley Blackwell)
Roy D M, Luz S 1999 Audio meeting history tool: Interactive graphical user-support for virtual audio meetings, In Proc. ESCA Workshop on Accessing Information in Spoken Audio, pp 107–110
Stasser G, Taylor L 1991 Speaking turns in face-to-face discussions, J. Personality Social Psychol. 60(5): 675–684
Article Google Scholar
Uchihashi S, Foote J, Girgensohn A, Boreczky J 1999 Video manga: generating semantically meaningful video summaries, In Proc. ACM Multimedia, pp 383–392
Waibel A, Schultz T, Bett M, Denecke M, Malkin R, Rogina I, Stiefelhagen R, Yang J 2003 SMaRT: the smart meeting room task at ISL, In Proc IEEE ICASSP, pp IV-752–IV-755
Wan V, Hain T 2006 Strategies for language model web-data collection, In Proc IEEE ICASSP, pp 1069–1072
Wilson T, Raaijmakers S 2008, Comparing word, character, and phoneme n-grams for subjective utterance recognition, In Proc. Interspeech, pp 1614–1617
Wölfel M, McDonough J 2009, Distant speech recognition (Chichester: Wiley)
Woodland P C, Povey D 2002 Large scale discriminative training of hidden Markov models for speech recognition, Comput. Speech Lang. 16(1): 25–47
Article Google Scholar
Wooters C, Huijbregts M 2008 The ICSI RT07s speaker diarization system, In Multimodal Technologies for Perception of Humans, LNCS, no. 4625. Berlin, Heidelberg: Springer, pp 509–519
Wrigley S, Brown G, Wan V, Renals S 2005 Speech and crosstalk detection in multichannel audio, IEEE Trans. Speech Audio Process. 13(1): 84–91
Article Google Scholar
Yong R, Gupta A, Cadiz J 2001 Viewing meetings captured by an omni-directional camera, Proc. ACM CHI, pp 450–457
Zechner K 2002 Automatic summarization of open-domain multiparty dialogues in diverse genres, Comput. Linguistics 28(4): 447–485
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
STEVE RENALS

Authors

STEVE RENALS
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to STEVE RENALS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

RENALS, S. Automatic analysis of multiparty meetings. Sadhana 36, 917–932 (2011). https://doi.org/10.1007/s12046-011-0051-3

Download citation

Published: 22 November 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s12046-011-0051-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic analysis of multiparty meetings

Abstract

Access this article

Similar content being viewed by others

SmartMeeting: An Novel Mobile Voice Meeting Minutes Generation and Analysis System

The ALICO corpus: analysing the active listener

Real-Time Feedback System for Monitoring and Facilitating Discussions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic analysis of multiparty meetings

Abstract

Access this article

Similar content being viewed by others

SmartMeeting: An Novel Mobile Voice Meeting Minutes Generation and Analysis System

The ALICO corpus: analysing the active listener

Real-Time Feedback System for Monitoring and Facilitating Discussions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation