Skip to main content

Noise Subspace Fuzzy C-Means Clustering for Robust Speech Recognition

  • Conference paper
Computational Science and Its Applications - ICCSA 2006 (ICCSA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3984))

Included in the following conference series:

  • 888 Accesses

Abstract

In this paper a fuzzy C-means (FCM) based approach for speech/non-speech discrimination is developed to build an effective voice activity detection (VAD) algorithm. The proposed VAD method is based on a soft-decision clustering approach built over a ratio of subband energies that improves recognition performance in noisy environments. The accuracy of the FCM-VAD algorithm lies in the use of a decision function defined over a multiple-observation (MO) window of averaged subband energy ratio and the modeling of noise subspace into fuzzy prototypes. In addition, time efficiency is also reached due to the clustering approach which is fundamental in VAD real time applications, i.e. speech recognition. An exhaustive analysis on the Spanish SpeechDat-Car databases is conducted in order to assess the performance of the proposed method and to compare it to existing standard VAD methods. The results show improvements in detection accuracy over standard VADs and a representative set of recently reported VAD algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)

    Google Scholar 

  2. ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation v. 70, ITU-T Recommendation G.729-Annex B (1996)

    Google Scholar 

  3. Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)

    Article  Google Scholar 

  4. Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)

    Article  Google Scholar 

  5. Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)

    Article  Google Scholar 

  6. Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)

    Article  Google Scholar 

  7. Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)

    Article  Google Scholar 

  8. Ramírez, J., Segura, J.C., Benítez, C., García, L., Rubio, A.: Statistical Voice Activity Detection using a Multiple Observation Likelihood Ratio Test. IEEE Signal Processing Letters 12(10), 689–692 (2005)

    Article  Google Scholar 

  9. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)

    MATH  Google Scholar 

  10. Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 419–442. Prentice-Hall, Inc., Upper Saddle River (1992)

    Google Scholar 

  11. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River (1988)

    Google Scholar 

  12. Ramírez, J., Segura, J.C., Benítez, C., de la Torre A., Rubio, A.: An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition. IEEE Trans. on Speech and Audio Processing (2005) (in press)

    Google Scholar 

  13. Dunn, J.: A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J. Cybern. 3(3), 32–57 (1974)

    Article  MathSciNet  Google Scholar 

  14. Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    MATH  Google Scholar 

  15. Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Górriz, J.M., Ramírez, J., Segura, J.C., Puntonet, C.G., González, J.J. (2006). Noise Subspace Fuzzy C-Means Clustering for Robust Speech Recognition. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3984. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751649_85

Download citation

  • DOI: https://doi.org/10.1007/11751649_85

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34079-9

  • Online ISBN: 978-3-540-34080-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics