MLMI 2005: Machine Learning for Multimodal Interaction pp 428-439 | Cite as
NIST RT’05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings
Abstract
This paper presents different pre-processing techniques, coupled with three speaker diarization systems in the framework of the NIST 2005 Spring Rich Transcription campaign (RT’05S).
The pre-processing techniques aim at providing a signal quality index in order to build a unique “virtual” signal obtained from all the microphone recordings available for a meeting. This unique virtual signal relies on a weighted sum of the different microphone signals while the signal quality index is given according to a signal to noise ratio.
Two methods are used in this paper to compute the instantaneous signal to noise ratio: a speech activity detection based approach and a noise spectrum estimate. The speaker diarization task is performed using systems developed by different labs: the LIA, LIUM and CLIPS. Among the different system submissions made by these three labs, the best system obtained 24.5 % speaker diarization error for the conference subdomain and 18.4 % for the lecture subdomain.
Keywords
Broadcast News Evaluation Campaign Speaker Model Meeting Data Cepstral FeaturePreview
Unable to display preview. Download preview PDF.
References
- 1.Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.F., Gravier, G.: The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. In: EuroSpeech 2005, Lisboa, Portugal (2005)Google Scholar
- 2.Bonastre, J.F., Wils, F., Meignier, S.: ALIZE, a free toolkit for speaker recognition. In: ICASSP 2005, Philadelphia, USA (2005)Google Scholar
- 3.Cui, X., Bernard, A., Alwan, A.: A noise-robust ASR back-end technique based on weighted viterbi recognition. In: EuroSpeech 2003, Geneva, Switzerland (2003)Google Scholar
- 4.Hirsh, H.G.: Estimation of noise spectrum and its application to SNR-estimation and speech enhancement. Technical report tr-93-012, ICSI, Berkeley, USA (1993)Google Scholar
- 5.Meignier, S., Moraru, D., Fredouille, C., Besacier, L., Bonastre, J.F.: Benefits of prior acoustic segmentation for automatic speaker segmentation. In: ICASSP 2004, Montreal, Canada (2004)Google Scholar
- 6.Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: ICASSP 2004, Montreal, Canada (2004)Google Scholar
- 7.Meignier, S., Bonastre, J.F., Fredouille, C., Merlin, T.: Evolutive HMMfor speaker tracking system. In: ICASSP 2000, Istanbul, Turkey (2000)Google Scholar
- 8.Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.F., Besacier, L.: Step-by-step and integrated approaches in broadcast news speaker diarization. Computer and Speech Language Journal (accepted for publishing, 2005)Google Scholar
- 9.Siu, M.H., Rohlicek, R., Gish, H.: An unsupervised, sequential learning algorithm for segmentation of speech waveforms with multi speakers. In: ICASSP 1992, San Fransisco, USA, vol. 2, pp. 189–192 (1992)Google Scholar
- 10.Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, USA (1998)Google Scholar
- 11.Zhu, X., Barras, C., Meignier, S., Gauvain, J.L.: Combining speaker identification and BIC for speaker diarization. In: EuroSpeech 2005, Lisboa, Portugal (2005)Google Scholar
- 12.Delacourt, P., Wellekens, C.J.: DISTBIC: A speaker based segmentation for audio data indexing. Speech Communication 32, 111–126 (2000)CrossRefGoogle Scholar
- 13.Gauvain, J., Lamel, L., Adda, G.: Audio partitioning and transcription for broadcast data indexation. Multimedia Tools and Applications, 187–200 (2001)Google Scholar
- 14.Reynolds, D.A., Dunn, R.B., Laughlin, J.J.: The Lincoln speaker recognition system: NIST EVAL 2000. In: ICSLP 2000, Beijing, China, vol. 2 (2000)Google Scholar
- 15.Adami, A., Kajarekar, S.S., Hermansky, H.: A new speaker change detection method for two-speaker segmentation. In: ICASSP 2002, Orlando, USA, vol. IV, pp. 3908–3911 (2002)Google Scholar