Advertisement

C-Means Clustering Applied to Speech Discrimination

  • J. M. Górriz
  • J. Ramírez
  • I. Turias
  • C. G. Puntonet
  • J. González
  • E. W. Lang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3991)

Abstract

An effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The proposed speech/pause discrimination method is based on a hard-decision clustering approach built over a set of subband log-energies. Detecting the presence of speech frames (a new cluster) is achieved using a basic sequential algorithm scheme (BSAS) according to a given “distance” (in this case, geometrical distance) and a suitable threshold. The accuracy of the Cl-VAD algorithm lies in the use of a decision function defined over a multiple-observation (MO) window of averaged subband log-energies and the modeling of noise subspace into cluster prototypes. In addition, time efficiency is also reached due to the clustering approach which is fundamental in VAD real time applications, i.e. speech recognition. An exhaustive analysis on the Spanish SpeechDat-Car databases is conducted in order to assess the performance of the proposed method and to compare it to existing standard VAD methods. The results show improvements in detection accuracy over standard VADs and a representative set of recently reported VAD algorithms.

References

  1. 1.
    Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environments. Speech Communitation (3), 261–276 (2003)Google Scholar
  2. 2.
    Ramírez, J., Segura, J.C., Benítez, M.C., de la Torre, A., Rubio, A.: A New Adaptive Long-Term Spectral Estimation Voice Activity Detector. In: Proc. of EUROSPEECH 2003, Geneva, Switzerland, September 2003, pp. 3041–3044 (2003)Google Scholar
  3. 3.
    ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)Google Scholar
  4. 4.
    ITU, silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Recommendation G.729-Annex B (1996)Google Scholar
  5. 5.
    Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD Techniques for Real-Time Speech Transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)Google Scholar
  6. 6.
    Basbug, F., Swaminathan, K., Nandkumar, S.: Noise Reduction and Echo Cancellation Front-End for Speech Codecs. IEEE Transactions on Speech and Audio Processing 11(1), 1–13 (2003)CrossRefGoogle Scholar
  7. 7.
    Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)Google Scholar
  8. 8.
    Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)CrossRefGoogle Scholar
  9. 9.
    Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)CrossRefGoogle Scholar
  10. 10.
    Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)CrossRefGoogle Scholar
  11. 11.
    Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)CrossRefGoogle Scholar
  12. 12.
    Fisher, D.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 139–172 (1987)Google Scholar
  13. 13.
    Górriz, J.M., Ramírez, J., Segura, J.C., Puntonet, C.G.: Improved MO-LRT VAD based on bispectra Gaussian model. Electronics Letters 41(15), 877–879 (2005)CrossRefGoogle Scholar
  14. 14.
    Ramírez, J., Segura, J.C., Benítez, C., García, L., Rubio, A.: Statistical Voice Activity Detection using a Multiple Observation Likelihood Ratio Test. IEEE Signal Processing Letters 12(10), 689–692 (2005)CrossRefGoogle Scholar
  15. 15.
    Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, Inc., New York (1973)zbMATHGoogle Scholar
  16. 16.
    Jain, A.K., Flynn, P.J.: Image segmentation using clustering. In: Ahuja, N., Bowyer, K. (eds.) Advances in Image Understanding. A Festschrift for Azriel Rosenfeld, pp. 65–83. IEEE Press, Piscataway (1996)Google Scholar
  17. 17.
    Rasmussen, E.: Clustering algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 419–442. Prentice-Hall, Inc., Upper Saddle River (1992)Google Scholar
  18. 18.
    Salton, G.: Developments in automatic text retrieval. Science 253, 974–980 (1991)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Ramírez, J., Segura, J.C., Benítez, C., de la Torre, A., Rubio, A.: An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition. IEEE Trans. on Speech and Audio Processing (2005) (in press)Google Scholar
  20. 20.
    MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press (1967)Google Scholar
  21. 21.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference, and Prediction Series, 1st edn. Springer Series in Statistics (2001) ISBN: 0-387-95284-5Google Scholar
  22. 22.
    Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • J. M. Górriz
    • 1
  • J. Ramírez
    • 1
  • I. Turias
    • 2
  • C. G. Puntonet
    • 3
  • J. González
    • 3
  • E. W. Lang
    • 4
  1. 1.Dpt. Signal Theory, Networking and communicationsUniversity of GranadaSpain
  2. 2.Dpt. Computer ScienceUniversity of CádizSpain
  3. 3.Dpt. Computer Architecture and TechnologyUniversity of GranadaSpain
  4. 4.AG Neuro- und BioinformatikUniversität RegensburgDeutschland

Personalised recommendations