Skip to main content

Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

Abstract

The goal of speaker diarization is to determine where each participant speaks in a recording. One of the most commonly used technique is agglomerative clustering, where some number of initial models are grouped into the number of present speakers. The choice of complexity, topology, and the number of initial models is vital to the final outcome of the clustering algorithm. In prior systems, these parameters were directly assigned based on development data, and were the same for all recordings. In this paper we present three techniques to select the parameters individually for each case, obtaining a system that is more robust to changes in the data. Although the choice of these values depends on tunable parameters, they are less sensitive to changes in the acoustic data and to how the algorithm distributes data among the different clusters. We show that by using the three techniques, we achieve an improvement up to 8% relative in the development set and 19% relative in the test set over prior systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Reynolds, D., Torres-Carrasquillo, P.: Approaches and applications of audio diarization. In: ICASSP 2005, Philadelphia, PA, March 2005, pp. 953–956 (2005)

    Google Scholar 

  2. Chen, S.S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)

    Google Scholar 

  3. Wooters, C., Fung, J., Peskin, B., Anguera, X.: Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system. In: Fall 2004 Rich Transcription Workshop (RT 2004), Palisades, NY (November 2004)

    Google Scholar 

  4. Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)

    Google Scholar 

  5. Anguera, X., Wooters, C., Peskin, B., Aguilo, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain (July 2005)

    Google Scholar 

  6. Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Letters 11(8), 649–651 (2004)

    Article  Google Scholar 

  7. NIST rich transcription evaluations, website: http://www.nist.gov/speech/tests/rt

  8. Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The icsi-sri spring 2005 speech-to-text evaluation system. In: RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain (July 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Anguera, X., Wooters, C., Hernando, J. (2006). Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_22

Download citation

  • DOI: https://doi.org/10.1007/11965152_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69267-6

  • Online ISBN: 978-3-540-69268-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics