The AMI Speaker Diarization System for NIST RT06s Meeting Data

  • David A. van Leeuwen
  • Marijn Huijbregts
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)


We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker detection tasks. The speaker diarization systems are based on the TNO and ICSI system submitted for RT05s. For the conference room evaluation Single Distant Microphone condition, the SAD results perform well at 4.23 % error rate, and the ‘HMM-BIC’ SPKR results perform competatively at an error rate of 37.2 % including overlapping speech.


Hide Markov Model Bayesian Information Criterion Gaussian Mixture Model Viterbi Decoder Lecture Room 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fiscus, J.G., Radde, N., Garofolo, J.S., Le, A., Ajot, J., Laprun, C.: The rich transcription 2006 spring meeting recognition evaluation. LNCS, pp. 309–322. Springer, Heidelberg (2007)Google Scholar
  2. 2.
    van Leeuwen, D.A.: The TNO speaker diarization system for NIST rich transcription evaluation 2005 for meeting data. LNCS, pp. 400–449. Springer, Heidelberg (2006)Google Scholar
  3. 3.
    Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., Vepa, J., Wan, V.: The AMI meeting transcription system: Progress and performance. LNCS, pp. 419–431. Springer, Heidelberg (2007)Google Scholar
  4. 4.
    Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. LNCS, pp. 402–414. Springer, Heidelberg (2006)Google Scholar
  5. 5.
    Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2, 578–589 (1994)Google Scholar
  6. 6.
    Macho, D., Temko, A., Nadeu, C.: Robust speech activity detection in interactive smart-room environment. LNCS, pp. 236–247. Springer, Heidelberg (2007)Google Scholar
  7. 7.
    Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proc. Eurospeech 1997, Rhodes, Greece, pp. 1895–1898 (1997)Google Scholar
  8. 8.
    van Leeuwen, D.A., Bouten, J.S.: Results of the 2003 NFI-TNO forensic speaker recognition evaluation. In: Proc. Odyssey 2004 Speaker and Language recognition workshop, ISCA, pp. 75–82 (2004)Google Scholar
  9. 9.
    Pellom, B., Hacioglu, K.: Recent Improvements in the CU Sonic ASR system for Noisy Speech: The SPINE Task. In: Proc. ICASSP (2003)Google Scholar
  10. 10.
    Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Lettres 11, 649–651 (2004)CrossRefGoogle Scholar
  11. 11.
    Navrátil, J., Ramsawamy, G.N.: The awe and mistery of t-norm. In: Proc. Eurospeech, pp. 2009–2012 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David A. van Leeuwen
    • 1
  • Marijn Huijbregts
    • 2
  1. 1.TNO Human FactorsSoesterbergThe Netherlands
  2. 2.Department of EEMCS, Human Media InteractionUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations