Speaker Diarization: From Broadcast News to Lectures

  • Xuan Zhu
  • Claude Barras
  • Lori Lamel
  • Jean-Luc Gauvain
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)


This paper presents the LIMSI speaker diarization system for lecture data, in the framework of the Rich Transcription 2006 Spring (RT-06S) meeting recognition evaluation. This system builds upon the baseline diarization system designed for broadcast news data. The baseline system combines agglomerative clustering based on Bayesian information criterion with a second clustering using state-of-the-art speaker identification techniques. In the RT-04F evaluation, the baseline system provided an overall diarization error of 8.5% on broadcast news data. However since it has a high missed speech error rate on lecture data, a different speech activity detection approach based on the log-likelihood ratio between the speech and non-speech models trained on the seminar data was explored. The new speaker diarization system integrating this module provides an overall diarization error of 20.2% on the RT-06S Multiple Distant Microphone (MDM) data.


Gaussian Mixture Model Baseline System Noisy Speech Speech Segment Broadcast News 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tranter, S.E., Reynolds, D.A.: Speaker diarisation for broadcast news. In: Proc. ISCA Speaker Recognition Workshop Odyssey 2004, Toledo, Spain (May 2004)Google Scholar
  2. 2.
    Barras, C., Zhu, X., Meignier, S., Gauvain, J.-L.: Multi-Stage Speaker Diarization of Broadcast News. The IEEE Transactions on Audio, Speech and Language Processing (to appear, September 2006)Google Scholar
  3. 3.
    Anguera, X., Wooters, C., Peskin, B., Aguilo, M.: Robust Speaker Segmentation for Meeting: The ICSI-SRI Spring 2005 Diarization System. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Istrate, D., Fredouille, C., Meignier, S., Besacier, L., Bonastre, J.-F.: NIST RT05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    NIST, Spring 2006 Rich Transcription (RT-06S) Meeting Recognition Evaluation Plan (February 2006),
  6. 6.
    Zhu, X., Barras, C., Meignier, S., Gauvain, J.-L.: Combining Speaker Identification and BIC for Speaker Diarization. In: ISCA Interspeech 2005, Lisbon, September 2005, pp. 2441–2444 (2005)Google Scholar
  7. 7.
    NIST, Fall 2004 Rich Transcription (RT-04F) evaluation plan, (August 2004),
  8. 8.
    Siegler, M., Jain, U., Raj, B., Stern, R.: Automatic segmentation and clustering of broadcast news audio. In: The DARPA Speech Recognition Workshop, Chantilly, USA (February 1997)Google Scholar
  9. 9.
    Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, USA (February 1998)Google Scholar
  10. 10.
    Cettolo, M.: Segmentation, classification and clustering of an Italian broadcast news corpus. In: Conf. on Content-Based Multimedia Information Access (RIAO 2000), Paris (April 2000)Google Scholar
  11. 11.
    Schroeder, J., Campbell, J. (eds.): Digital Signal Processing (DSP), a review journal - Special issue on NIST 1999 speaker recognition workshop. Academic Press, London (2000)Google Scholar
  12. 12.
    Barras, C., Gauvain, J.-L.: Feature and score normalization for speaker verification of cellular data. In: IEEE ICASSP 2003, Hong Kong (2003)Google Scholar
  13. 13.
    Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Proc. ISCA Speaker Recognition Workshop Odyssey 2001, Chania, Crete, June 2001, pp. 213–218 (2001)Google Scholar
  14. 14.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing (DSP), a review journal - Special issue on NIST 1999 speaker recognition workshop, 10(1-3), 19–41 (2000)Google Scholar
  15. 15.
    Gauvain, J.-L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–298 (1994)CrossRefGoogle Scholar
  16. 16.
    Reynolds, D.A., Singer, E., Carlson, B.A., O’Leary, G.C., McLaughlin, J.J., Zissman, M.A.: Blind clustering of speech utterances based on speaker and language characteristics. In: Proc. of International Conf. on Spoken Language Processing (ICSLP 1998) (1998)Google Scholar
  17. 17.
    Nguyen, L., Abdou, S., Afify, M., Makhoul, J., Matsoukas, S., Schwartz, R., Xiang, B., Lamel, L., Gauvain, J.-L., Adda, G., Schwenk, H., Lefevre, F.: The 2004 BBN/LIMSI 10xRT English broadcast news transcription system. In: DARPA RT 2004’S, Palisades, NY (November 2004)Google Scholar
  18. 18.
    Mirghafori, N., Wooters, C.: Nuts and Flakes: A Study of Data Characteristics in Speaker Diarization. In: IEEE ICASSP 2006, Toulouse, pp. 1017–1020 (May 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xuan Zhu
    • 1
  • Claude Barras
    • 1
  • Lori Lamel
    • 1
  • Jean-Luc Gauvain
    • 1
  1. 1.LIMSI-CNRSFrance

Personalised recommendations