The Diarization System for an Unknown Number of Speakers

  • Oleg Kudashev
  • Alexander Kozlov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8113)


This paper presents a system for speaker diarization that can be used if the number of speakers is unknown. The proposed system is based on the ag-glomerative clustering approach in conjunction with factor analysis, Total Variability approach and linear discriminant analysis. We present the results of the proposed diarization system. The results demonstrate that our system can be used both if an answering machine or handset transfer is present in telephone recordings and in the case of a summed channel in telephone or meeting recordings.


diarization speaker segmentation speaker recognition clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis for Speaker Verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4), 788–798 (2011)CrossRefGoogle Scholar
  2. 2.
    Jin, Q., Laskowski, K., Schultz, T., Waibel, A.: Speaker segmentation and clustering in meetings. In: Proceedings of the 8th International Conference on Spoken Language Processing, Jeju Island, Korea (2004)Google Scholar
  3. 3.
    Reynolds, D., Kenny, P., Castaldo, F.: A Study of New Approaches to Speaker Diarization. In: Proc. Interspeech, pp. 1047–1050 (2009)Google Scholar
  4. 4.
    Tranter, S., Reynolds, D.: An Overview of Automatic Speaker Diarisation Systems. IEEE Transactions on Audio, Speech, and Language Processing 14(5), 1557–1565 (2006)CrossRefGoogle Scholar
  5. 5.
    Kenny, P.: Bayesian Analysis of Speaker Diarization with Eigenvoice Priors. Technical report. Centre de recherche informatique de Montreal (CRIM), Montreal, Canada (2008)Google Scholar
  6. 6.
    2008 NIST Speaker Recognition Evaluation Test Set,
  7. 7.
    AMI Meeting Corpus,
  8. 8.
    Rich Transcription Evaluation Project,
  9. 9.
    Rich Transcription Spring 2006 Evaluation,
  10. 10.
    Vijayasenan, D., Valente, F., Bourlard, H.: An Information Theoretic Approach to Speaker Diarization of Meeting Data. IEEE Transactions on Audio, Speech, and Language Processing 17(7), 1382–1393 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Oleg Kudashev
    • 1
  • Alexander Kozlov
    • 2
  1. 1.Mechanics and OpticsNational Research University of Information TechnologiesSt. PetesburgRussia
  2. 2.STC-innovations Ltd.St. PetersburgRussia

Personalised recommendations