Cognitive Computation

, Volume 2, Issue 3, pp 191–198 | Cite as

A Non-Linear VAD for Noisy Environments



This paper deals with non-linear transformations for improving the performance of an entropy-based voice activity detector (VAD). The idea to use a non-linear transformation has already been applied in the field of speech linear prediction, or linear predictive coding, based on source separation techniques, where a score function is added to classical equations in order to take into account the true distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if the signal is clean, the estimated entropy is essentially the same; if the signal is noisy, however, the frames transformed using the score function may give entropy that is different in voiced frames as compared to unvoiced ones. Experimental evidence is given to show that this fact enables voice activity detection under high noise, where the simple entropy method fails.


VAD Score function Entropy Speech 



This work has been supported by the University of Vic under grants R0904, R0912, and by the Ministry of Science and Innovation of Spain (MICINN) under grant TEC2008-02717-E/TEC. The authors thank two anonymous referees for helpful comments that have leaded to improvement of the paper.


  1. 1.
    Altmann G. Cognitive models of speech processing: psycholinguistic and computational perspectives. USA: The MIT Press; 1995. ISBN-13: 978-0262510844.Google Scholar
  2. 2.
    Singh D, Boland F. Voice activity detection, ACM Crossroads 13.4: Computer Vision and Speech. 2007.Google Scholar
  3. 3.
    Grimm M, Kroschel K, editors. Robust speech recognition and understanding. Vienna, Austria: I-Tech; 2007. ISBN: 987-3-90213-08-0.Google Scholar
  4. 4.
    Górriz JM, Ramírez J, Segura JC, Puntonet CG. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. J Acoust Soc Amer. 2006;120:470–81.CrossRefGoogle Scholar
  5. 5.
    Jia C, Xu B. An improved entropy-based endpoint detection algorithm. In: Proc ISCSLP 2002, 3rd Int Symp Chinese Spoken Lang Process, Beijing; 2002. Accessed 3 Apr 2010.
  6. 6.
    Shin W-H, Lee B-S, Lee Y-K, Lee J-S. Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc ICASSP 2000, IEEE Int Conf Acoust, Speech and Signal Process, Istanbul, Turkey; June 2000. Accessed 3 Apr 2010.
  7. 7.
    Van Gerven S, Xie F. A comparative study of speech detection methods. In: Kokkinakis G, Fakotakis N, Dermatas E, editors. Eurospeech’97, 5th Europ Conf Speech Comm Tech, Rhodes, Greece; 22–25 Sept 1997. p. 1095–8. ISCA Archive Accessed 3 Apr 2010.
  8. 8.
    Hariharan R, Häkkinen J, Laurila K. Robust end-of-utterance detection for real-time speech recognition applications. In: Proc ICASSP 2001; 2001. p. 249–52. Accessed 3 Apr 2010.
  9. 9.
    Acero A, Crespo C, De la Torre C, Torrecilla J. Robust HMM-based endpoint detector. In: Eurospeech’93, 3rd Europ Conf Speech Comm Tech, Berlin, Germany; 22–25 Sept 1993. p. 1551–4. Accessed 3 Apr 2010.
  10. 10.
    Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In: Int Workshop Speech Comp (SPECOM); 1997. p. 109–14.Google Scholar
  11. 11.
    Shen J, Hung J, Lee L. Robust entropybased endpoint detection for speech recognition in noisy environments. In: ICSLP’98, 5th Int Conf Spoken Lang Process, Sydney, Australia; 30 Nov–4 Dec 1998. Paper 0232. Accessed 3 Apr 2010.
  12. 12.
    Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. 623–656, July, Oct. 1948.Google Scholar
  13. 13.
    Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–12.CrossRefGoogle Scholar
  14. 14.
    Kullback S. Information theory and statistics. Mineola, NY: Dover Publications; 1968.Google Scholar
  15. 15.
    Verdú S. Mismatched estimation and relative entropy. In: Proc 2009 IEEE Int Symp Inform Theory, vol. 2. Seoul, Korea: Coex; 2009. p. 809–13. ISBN: 978-1-4244-4312-3.Google Scholar
  16. 16.
    Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Wiley; 2001.CrossRefGoogle Scholar
  17. 17.
    Solé-Casals J, Taleb A, Jutten C. Parametric approach to blind deconvolution of nonlinear channels. Neurocomputing. 2002;48:339–55.CrossRefGoogle Scholar
  18. 18.
    Solé-Casals J, Monte E, Taleb A, Jutten C. Source separation techniques applied to speech linear prediction. In: ICSLP2000, 6th Int Conf Spoken Lang Process, vol. 4, Beijing, China; 16–20 Oct 2000. p. 680–3. Accessed 3 Apr 2010.
  19. 19.
    Härdle W. Smoothing techniques with implementation in S. Berlin-New York: Springer; 1990.Google Scholar
  20. 20.
    Ozeki K. The mutual information as a scoring function for speech recognition. IEICE technical report. Speech. 1995;431(95):53–60.Google Scholar
  21. 21.
    Buldygin VV, Kozachenko YuV. Metric characterization of random variables and stochastic processes. Providence: American Mathematical Society; 2000. (Translations of Mathematical Monographs, vol. 188).Google Scholar
  22. 22.
    Mathis H, Joho M, Moschytz GS. A simple threshold nonlinearity for blind separation of sub-Gaussian signals. In: ISCAS 2000, IEEE Intl Symp Circuits Syst, Geneva, Switzerland; 28–31 May 2000. p. IV 489–92. Accessed 3 Apr 2010.
  23. 23.
    Cardoso J-F. Blind signal separation: statistical principles. Proc IEEE. 1998;9:2009–25.CrossRefGoogle Scholar
  24. 24.
    ETSI standard doc. ETSI ES 201 108 V1.1.3 (2003-09).Google Scholar
  25. 25.
    Solé-Casals J, Monte-Moreno E. Nonlinear prediction based on score function. In: Proc EUPISCO-2002, 11th Europ Signal Process Conf, vol. III, Toulouse, France; 3–6 Sept 2002. p. 533–6. Accessed 3 Apr 2010.
  26. 26.
    Kim E-K, Han W-J, Oh Y-H. A score function of splitting band for two-band speech model. Speech Commun. 2003;41:663–74.CrossRefGoogle Scholar
  27. 27.
    Kokkinakis K, Nandi AK. Flexible score functions for blind separation of speech signals based on generalized Gamma probability density functions. In: Proc ICASSP 2006, Acoustics, Speech and Signal Processing, vol. 1, 2006.Google Scholar
  28. 28.
    Chiang T-H, Lin Y-C. An integrated scoring function for a spoken dialogue system. In: Signal Process Proc, 1998. ICSP ’98, 4th Intl Conf Signal Process, vol. 1, Beijing, China; 12–16 Oct 1998. p. 617–20. ISBN: 0-7803-4325-5.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Digital Technologies GroupUniversity of VicVicSpain

Personalised recommendations