Noise Subspace Fuzzy C-Means Clustering for Robust Speech Recognition
In this paper a fuzzy C-means (FCM) based approach for speech/non-speech discrimination is developed to build an effective voice activity detection (VAD) algorithm. The proposed VAD method is based on a soft-decision clustering approach built over a ratio of subband energies that improves recognition performance in noisy environments. The accuracy of the FCM-VAD algorithm lies in the use of a decision function defined over a multiple-observation (MO) window of averaged subband energy ratio and the modeling of noise subspace into fuzzy prototypes. In addition, time efficiency is also reached due to the clustering approach which is fundamental in VAD real time applications, i.e. speech recognition. An exhaustive analysis on the Spanish SpeechDat-Car databases is conducted in order to assess the performance of the proposed method and to compare it to existing standard VAD methods. The results show improvements in detection accuracy over standard VADs and a representative set of recently reported VAD algorithms.
KeywordsSpeech Recognition Discrete Fourier Transform Speech Recognition System Voice Activity Detection IEEE Signal Processing Letter
Unable to display preview. Download preview PDF.
- 1.ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)Google Scholar
- 2.ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation v. 70, ITU-T Recommendation G.729-Annex B (1996)Google Scholar
- 10.Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 419–442. Prentice-Hall, Inc., Upper Saddle River (1992)Google Scholar
- 11.Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River (1988)Google Scholar
- 12.Ramírez, J., Segura, J.C., Benítez, C., de la Torre A., Rubio, A.: An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition. IEEE Trans. on Speech and Audio Processing (2005) (in press)Google Scholar
- 15.Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)Google Scholar