Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System

Anguera, Xavier; Wooters, Chuck; Pardo, Jose M.

doi:10.1007/11965152_31

Xavier Anguera^19,20,
Chuck Wooters¹⁹ &
Jose M. Pardo^19,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

813 Accesses
17 Citations

Abstract

In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. The presented system is based on the RT05s system, which uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to decide which pairs of clusters to merge and to determine when to stop merging clusters. In this year’s system we have eliminated any remaining need for training data, therefore increasing robustness. In our primary system we have introduced several improvements from last year. First, we use a new training-free speech/non-speech detection algorithm. Second, we introduce a new algorithm for system initialization. The third improvement is the use of a frame purification algorithm to increase cluster discriminability. Finally, we describe the use of inter-channel delays as features. We explain each of these improvements and show our system’s results on the official evaluation data using hand-aligned references and forced-alignments. We also analyze some of the results and propose improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

NIST rich transcription evaluations, website: http://www.nist.gov/speech/tests/rt
Istrate, D., Fredouille, C., Meignier, S., Besacier, L., Bonastre, J.-F.: NIST RT05S evaluation: Pre-processing techniques and speaker diarization on multiple microphone meetings. In: NIST 2005 Spring Rich Transcrition Evaluation Workshop, Edinburgh, UK (July 2005)
Google Scholar
Cassidy, S.: The macquarie speaker diarization system for RT05S. In: NIST 2005 Spring Rich Transcrition Evaluation Workshop, Edinburgh, UK (July 2005)
Google Scholar
van Leeuwen, D.: The TNO speaker diarization system system for NIST RT05s for meeting data. In: NIST 2005 Spring Rich Transcrition Evaluation Workshop, Edinburgh, UK (July 2005)
Google Scholar
Anguera, X., Wooters, C., Peskin, B., Aguilo, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain (July 2005)
Google Scholar
Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Automatic cluster complexity and quantity selection: Towards robust speaker diarization. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 248–256. Springer, Heidelberg (2006)
Chapter Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Speaker diarization for multi-party meetings using acoustic fusion. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Puerto Rico, USA (November 2005)
Google Scholar
Anguera, X., Aguilo, M., Wooters, C., Nadeu, C., Hernando, J.: Hybrid speech/non-speech detector applied to speaker diarization of meetings. In: Speaker Odyssey 2006, Puerto Rico, USA (June 2006)
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Friends and enemies: A novel initialization for speaker diarization. In: Proc. ICSLP, Pittsburgh, USA (September 2006) (to appear)
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Purity algorithms for speaker diarization of meetings data. In: ICASSP, Toulouse, France (May 2006)
Google Scholar
Shaobing Chen, S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)
Google Scholar
Pardo, J.M., Anguera, X., Wooters, C.: Speaker diarization for multiple distant microphone meetings: Mixing acoustic features and inter-channel time differences. In: Proc. ICSLP (September 2006)
Google Scholar
Janin, A., Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Frankel, J., Zheng, J.: The ICSI-SRI spring 2006 meeting recognition system. In: Proceedings of the Rich Transcription 2006 Spring Meeting Recognition Evaluation, Washington, USA (May 2006)
Google Scholar
Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The icsi-sri spring 2005 speech-to-text evaluation system. In: RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain (July 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, 94704, USA
Xavier Anguera, Chuck Wooters & Jose M. Pardo
Technical University of Catalonia, Barcelona, Spain
Xavier Anguera
Universidad Politecnica de Madrid, Madrid, Spain
Jose M. Pardo

Authors

Xavier Anguera
View author publications
You can also search for this author in PubMed Google Scholar
Chuck Wooters
View author publications
You can also search for this author in PubMed Google Scholar
Jose M. Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anguera, X., Wooters, C., Pardo, J.M. (2006). Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_31

Download citation

DOI: https://doi.org/10.1007/11965152_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics