Skip to main content
Log in

A Non-Linear VAD for Noisy Environments

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

This paper deals with non-linear transformations for improving the performance of an entropy-based voice activity detector (VAD). The idea to use a non-linear transformation has already been applied in the field of speech linear prediction, or linear predictive coding, based on source separation techniques, where a score function is added to classical equations in order to take into account the true distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if the signal is clean, the estimated entropy is essentially the same; if the signal is noisy, however, the frames transformed using the score function may give entropy that is different in voiced frames as compared to unvoiced ones. Experimental evidence is given to show that this fact enables voice activity detection under high noise, where the simple entropy method fails.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Altmann G. Cognitive models of speech processing: psycholinguistic and computational perspectives. USA: The MIT Press; 1995. ISBN-13: 978-0262510844.

    Google Scholar 

  2. Singh D, Boland F. Voice activity detection, ACM Crossroads 13.4: Computer Vision and Speech. 2007.

  3. Grimm M, Kroschel K, editors. Robust speech recognition and understanding. Vienna, Austria: I-Tech; 2007. ISBN: 987-3-90213-08-0.

    Google Scholar 

  4. Górriz JM, Ramírez J, Segura JC, Puntonet CG. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. J Acoust Soc Amer. 2006;120:470–81.

    Article  Google Scholar 

  5. Jia C, Xu B. An improved entropy-based endpoint detection algorithm. In: Proc ISCSLP 2002, 3rd Int Symp Chinese Spoken Lang Process, Beijing; 2002. http://www.colips.org/conference/iscslp2006/anthology/2002/Papers/096.PDF. Accessed 3 Apr 2010.

  6. Shin W-H, Lee B-S, Lee Y-K, Lee J-S. Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc ICASSP 2000, IEEE Int Conf Acoust, Speech and Signal Process, Istanbul, Turkey; June 2000. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.1840&rep=rep1&type=pdf. Accessed 3 Apr 2010.

  7. Van Gerven S, Xie F. A comparative study of speech detection methods. In: Kokkinakis G, Fakotakis N, Dermatas E, editors. Eurospeech’97, 5th Europ Conf Speech Comm Tech, Rhodes, Greece; 22–25 Sept 1997. p. 1095–8. ISCA Archive http://www.isca-speech.org/archive/eurospeech_1997/e97_1095.html. Accessed 3 Apr 2010.

  8. Hariharan R, Häkkinen J, Laurila K. Robust end-of-utterance detection for real-time speech recognition applications. In: Proc ICASSP 2001; 2001. p. 249–52. http://ieeexplore.ieee.org/iel5/7486/20365/00940814.pdf. Accessed 3 Apr 2010.

  9. Acero A, Crespo C, De la Torre C, Torrecilla J. Robust HMM-based endpoint detector. In: Eurospeech’93, 3rd Europ Conf Speech Comm Tech, Berlin, Germany; 22–25 Sept 1993. p. 1551–4. http://www.isca-speech.org/archive/eurospeech_1993/e93_1551.html. Accessed 3 Apr 2010.

  10. Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In: Int Workshop Speech Comp (SPECOM); 1997. p. 109–14.

  11. Shen J, Hung J, Lee L. Robust entropybased endpoint detection for speech recognition in noisy environments. In: ICSLP’98, 5th Int Conf Spoken Lang Process, Sydney, Australia; 30 Nov–4 Dec 1998. Paper 0232. http://www.ee.columbia.edu/~dpwe/papers/ShenHL98-endpoint.pdf. Accessed 3 Apr 2010.

  12. Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. 623–656, July, Oct. 1948.

    Google Scholar 

  13. Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–12.

    Article  Google Scholar 

  14. Kullback S. Information theory and statistics. Mineola, NY: Dover Publications; 1968.

  15. Verdú S. Mismatched estimation and relative entropy. In: Proc 2009 IEEE Int Symp Inform Theory, vol. 2. Seoul, Korea: Coex; 2009. p. 809–13. ISBN: 978-1-4244-4312-3.

  16. Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Wiley; 2001.

    Book  Google Scholar 

  17. Solé-Casals J, Taleb A, Jutten C. Parametric approach to blind deconvolution of nonlinear channels. Neurocomputing. 2002;48:339–55.

    Article  Google Scholar 

  18. Solé-Casals J, Monte E, Taleb A, Jutten C. Source separation techniques applied to speech linear prediction. In: ICSLP2000, 6th Int Conf Spoken Lang Process, vol. 4, Beijing, China; 16–20 Oct 2000. p. 680–3. http://www.isca-speech.org/archive/icslp_2000/i00_4680.html. Accessed 3 Apr 2010.

  19. Härdle W. Smoothing techniques with implementation in S. Berlin-New York: Springer; 1990.

    Google Scholar 

  20. Ozeki K. The mutual information as a scoring function for speech recognition. IEICE technical report. Speech. 1995;431(95):53–60.

    Google Scholar 

  21. Buldygin VV, Kozachenko YuV. Metric characterization of random variables and stochastic processes. Providence: American Mathematical Society; 2000. (Translations of Mathematical Monographs, vol. 188).

    Google Scholar 

  22. Mathis H, Joho M, Moschytz GS. A simple threshold nonlinearity for blind separation of sub-Gaussian signals. In: ISCAS 2000, IEEE Intl Symp Circuits Syst, Geneva, Switzerland; 28–31 May 2000. p. IV 489–92. http://www.icom.hsr.ch/uploads/media/hmat-joho-gsm-00-iscas.pdf. Accessed 3 Apr 2010.

  23. Cardoso J-F. Blind signal separation: statistical principles. Proc IEEE. 1998;9:2009–25.

    Article  Google Scholar 

  24. ETSI standard doc. ETSI ES 201 108 V1.1.3 (2003-09).

  25. Solé-Casals J, Monte-Moreno E. Nonlinear prediction based on score function. In: Proc EUPISCO-2002, 11th Europ Signal Process Conf, vol. III, Toulouse, France; 3–6 Sept 2002. p. 533–6. http://www.eurasip.org/Proceedings/Eusipco/2002/articles/paper707.pdf. Accessed 3 Apr 2010.

  26. Kim E-K, Han W-J, Oh Y-H. A score function of splitting band for two-band speech model. Speech Commun. 2003;41:663–74.

    Article  Google Scholar 

  27. Kokkinakis K, Nandi AK. Flexible score functions for blind separation of speech signals based on generalized Gamma probability density functions. In: Proc ICASSP 2006, Acoustics, Speech and Signal Processing, vol. 1, 2006.

  28. Chiang T-H, Lin Y-C. An integrated scoring function for a spoken dialogue system. In: Signal Process Proc, 1998. ICSP ’98, 4th Intl Conf Signal Process, vol. 1, Beijing, China; 12–16 Oct 1998. p. 617–20. ISBN: 0-7803-4325-5.

Download references

Acknowledgments

This work has been supported by the University of Vic under grants R0904, R0912, and by the Ministry of Science and Innovation of Spain (MICINN) under grant TEC2008-02717-E/TEC. The authors thank two anonymous referees for helpful comments that have leaded to improvement of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Solé-Casals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solé-Casals, J., Zaiats, V. A Non-Linear VAD for Noisy Environments. Cogn Comput 2, 191–198 (2010). https://doi.org/10.1007/s12559-010-9037-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-010-9037-4

Keywords

Navigation