Speech Event Detection Using Support Vector Machines

  • P. Yélamos
  • J. Ramírez
  • J. M. Górriz
  • C. G. Puntonet
  • J. C. Segura
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3991)


An effective speech event detector is presented in this work for improving the performance of speech processing systems working in noisy environment. The proposed method is based on a trained support vector machine (SVM) that defines an optimized non-linear decision rule involving the subband SNRs of the input speech. It is analyzed the classification rule in the input space and the ability of the SVM model to learn how the signal is masked by the background noise. The algorithm also incorporates a noise reduction block working in tandem with the voice activity detector (VAD) that has shown to be very effective in high noise environments. The experimental analysis carried out on the Spanish SpeechDat-Car database shows clear improvements over standard VADs including ITU G.729, ETSI AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.


Support Vector Machine False Alarm Rate Support Vector Machine Model Voice Activity Detector Sequential Minimal Optimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982)MATHGoogle Scholar
  2. 2.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)MATHGoogle Scholar
  3. 3.
    Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, Inc., New York (1998)MATHGoogle Scholar
  4. 4.
    Enqing, D., Guizhong, L., Yatong, Z., Xiaodi, Z.: Applying support vector machines to voice activity detection. In: 6th International Conference on Signal Processing, vol. 2, pp. 1124–1127 (2002)Google Scholar
  5. 5.
    ITU: A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)Google Scholar
  6. 6.
    Enqing, D., Heming, Z., Yongli, L.: Low bit and variable rate speech coding using local cosine transform. In: Proc. of the 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering (TENCON 2002), vol. 1, pp. 423–426 (2002)Google Scholar
  7. 7.
    Qi, F., Bao, C., Liu, Y.: A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech. In: International Symposium on Chinese Spoken Language Processing, pp. 77–80 (2004)Google Scholar
  8. 8.
    ETSI: Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels. ETSI EN 301 708 Recommendation (1999)Google Scholar
  9. 9.
    ETSI: Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI ES 201 108 Recommendation (2002)Google Scholar
  10. 10.
    Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16, 1–3 (1999)CrossRefGoogle Scholar
  11. 11.
    Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36, 180–181 (2000)CrossRefGoogle Scholar
  12. 12.
    Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10, 146–157 (2002)CrossRefGoogle Scholar
  13. 13.
    Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10, 341–351 (2002)CrossRefGoogle Scholar
  14. 14.
    Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  15. 15.
    Clarkson, P., Moreno, P.: On the use of support vector machines for phonetic classification. In: Proc. of the IEEE Int. Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 585–588 (1999)Google Scholar
  16. 16.
    Ganapathiraju, A., Hamaker, J., Picone, J.: Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing 52, 2348–2355 (2004)CrossRefGoogle Scholar
  17. 17.
    Chang, C., Lin, C.J.: LIBSVM: a library for support vector machines. Technical report, Dept. of Computer Science and Information Engineering, National Taiwan University (2001)Google Scholar
  18. 18.
    Cortes, C., Vapnik, V.: Support-vector network. Machine Learning (1995)Google Scholar
  19. 19.
    Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • P. Yélamos
    • 1
  • J. Ramírez
    • 1
  • J. M. Górriz
    • 1
  • C. G. Puntonet
    • 2
  • J. C. Segura
    • 1
  1. 1.Dept. of Signal Theory, Networking and CommunicationsUniversity of GranadaSpain
  2. 2.Dept. of Architecture and Computer TechnologyUniversity of GranadaSpain

Personalised recommendations