Advertisement

NIST RT’05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings

  • Dan Istrate
  • Corinne Fredouille
  • Sylvain Meignier
  • Laurent Besacier
  • Jean François Bonastre
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)

Abstract

This paper presents different pre-processing techniques, coupled with three speaker diarization systems in the framework of the NIST 2005 Spring Rich Transcription campaign (RT’05S).

The pre-processing techniques aim at providing a signal quality index in order to build a unique “virtual” signal obtained from all the microphone recordings available for a meeting. This unique virtual signal relies on a weighted sum of the different microphone signals while the signal quality index is given according to a signal to noise ratio.

Two methods are used in this paper to compute the instantaneous signal to noise ratio: a speech activity detection based approach and a noise spectrum estimate. The speaker diarization task is performed using systems developed by different labs: the LIA, LIUM and CLIPS. Among the different system submissions made by these three labs, the best system obtained 24.5 % speaker diarization error for the conference subdomain and 18.4 % for the lecture subdomain.

Keywords

Broadcast News Evaluation Campaign Speaker Model Meeting Data Cepstral Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.F., Gravier, G.: The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. In: EuroSpeech 2005, Lisboa, Portugal (2005)Google Scholar
  2. 2.
    Bonastre, J.F., Wils, F., Meignier, S.: ALIZE, a free toolkit for speaker recognition. In: ICASSP 2005, Philadelphia, USA (2005)Google Scholar
  3. 3.
    Cui, X., Bernard, A., Alwan, A.: A noise-robust ASR back-end technique based on weighted viterbi recognition. In: EuroSpeech 2003, Geneva, Switzerland (2003)Google Scholar
  4. 4.
    Hirsh, H.G.: Estimation of noise spectrum and its application to SNR-estimation and speech enhancement. Technical report tr-93-012, ICSI, Berkeley, USA (1993)Google Scholar
  5. 5.
    Meignier, S., Moraru, D., Fredouille, C., Besacier, L., Bonastre, J.F.: Benefits of prior acoustic segmentation for automatic speaker segmentation. In: ICASSP 2004, Montreal, Canada (2004)Google Scholar
  6. 6.
    Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: ICASSP 2004, Montreal, Canada (2004)Google Scholar
  7. 7.
    Meignier, S., Bonastre, J.F., Fredouille, C., Merlin, T.: Evolutive HMMfor speaker tracking system. In: ICASSP 2000, Istanbul, Turkey (2000)Google Scholar
  8. 8.
    Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.F., Besacier, L.: Step-by-step and integrated approaches in broadcast news speaker diarization. Computer and Speech Language Journal (accepted for publishing, 2005)Google Scholar
  9. 9.
    Siu, M.H., Rohlicek, R., Gish, H.: An unsupervised, sequential learning algorithm for segmentation of speech waveforms with multi speakers. In: ICASSP 1992, San Fransisco, USA, vol. 2, pp. 189–192 (1992)Google Scholar
  10. 10.
    Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, USA (1998)Google Scholar
  11. 11.
    Zhu, X., Barras, C., Meignier, S., Gauvain, J.L.: Combining speaker identification and BIC for speaker diarization. In: EuroSpeech 2005, Lisboa, Portugal (2005)Google Scholar
  12. 12.
    Delacourt, P., Wellekens, C.J.: DISTBIC: A speaker based segmentation for audio data indexing. Speech Communication 32, 111–126 (2000)CrossRefGoogle Scholar
  13. 13.
    Gauvain, J., Lamel, L., Adda, G.: Audio partitioning and transcription for broadcast data indexation. Multimedia Tools and Applications, 187–200 (2001)Google Scholar
  14. 14.
    Reynolds, D.A., Dunn, R.B., Laughlin, J.J.: The Lincoln speaker recognition system: NIST EVAL 2000. In: ICSLP 2000, Beijing, China, vol. 2 (2000)Google Scholar
  15. 15.
    Adami, A., Kajarekar, S.S., Hermansky, H.: A new speaker change detection method for two-speaker segmentation. In: ICASSP 2002, Orlando, USA, vol. IV, pp. 3908–3911 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dan Istrate
    • 1
  • Corinne Fredouille
    • 1
  • Sylvain Meignier
    • 2
  • Laurent Besacier
    • 3
  • Jean François Bonastre
    • 1
  1. 1.LIA-AvignonAvignonFrance
  2. 2.LIUMLe MansFrance
  3. 3.CLIPS-IMAG (UJF & CNRS & INPG)GrenobleFrance

Personalised recommendations