Noise Subspace Fuzzy C-Means Clustering for Robust Speech Recognition

Górriz, J. M.; Ramírez, J.; Segura, J. C.; Puntonet, C. G.; González, J. J.

doi:10.1007/11751649_85

J. M. Górriz²⁴,
J. Ramírez²⁴,
J. C. Segura²⁴,
C. G. Puntonet²⁵ &
…
J. J. González²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3984))

Included in the following conference series:

International Conference on Computational Science and Its Applications

888 Accesses

Abstract

In this paper a fuzzy C-means (FCM) based approach for speech/non-speech discrimination is developed to build an effective voice activity detection (VAD) algorithm. The proposed VAD method is based on a soft-decision clustering approach built over a ratio of subband energies that improves recognition performance in noisy environments. The accuracy of the FCM-VAD algorithm lies in the use of a decision function defined over a multiple-observation (MO) window of averaged subband energy ratio and the modeling of noise subspace into fuzzy prototypes. In addition, time efficiency is also reached due to the clustering approach which is fundamental in VAD real time applications, i.e. speech recognition. An exhaustive analysis on the Spanish SpeechDat-Car databases is conducted in order to assess the performance of the proposed method and to compare it to existing standard VAD methods. The results show improvements in detection accuracy over standard VADs and a representative set of recently reported VAD algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)
Google Scholar
ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation v. 70, ITU-T Recommendation G.729-Annex B (1996)
Google Scholar
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)
Article Google Scholar
Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)
Article Google Scholar
Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)
Article Google Scholar
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)
Article Google Scholar
Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)
Article Google Scholar
Ramírez, J., Segura, J.C., Benítez, C., García, L., Rubio, A.: Statistical Voice Activity Detection using a Multiple Observation Likelihood Ratio Test. IEEE Signal Processing Letters 12(10), 689–692 (2005)
Article Google Scholar
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)
MATH Google Scholar
Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 419–442. Prentice-Hall, Inc., Upper Saddle River (1992)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River (1988)
Google Scholar
Ramírez, J., Segura, J.C., Benítez, C., de la Torre A., Rubio, A.: An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition. IEEE Trans. on Speech and Audio Processing (2005) (in press)
Google Scholar
Dunn, J.: A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J. Cybern. 3(3), 32–57 (1974)
Article MathSciNet Google Scholar
Bezdek, J.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
MATH Google Scholar
Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dpt. Signal Theory, Networking and communications, University of Granada, Spain
J. M. Górriz, J. Ramírez & J. C. Segura
Dpt. Computer Architecture and Technology, University of Granada, Spain
C. G. Puntonet & J. J. González

Authors

J. M. Górriz
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramírez
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Segura
View author publications
You can also search for this author in PubMed Google Scholar
C. G. Puntonet
View author publications
You can also search for this author in PubMed Google Scholar
J. J. González
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary,, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
OptimaNumerics Ltd.,, Cathedral House 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
School of Information and Communication Engineering, Sungkyunkwan University, Korea
Hyunseung Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Górriz, J.M., Ramírez, J., Segura, J.C., Puntonet, C.G., González, J.J. (2006). Noise Subspace Fuzzy C-Means Clustering for Robust Speech Recognition. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3984. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751649_85

Download citation

DOI: https://doi.org/10.1007/11751649_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34079-9
Online ISBN: 978-3-540-34080-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics