Skip to main content

Score Function for Voice Activity Detection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Abstract

In this paper we explore the use of non-linear transformations in order to improve the performance of an entropy based voice activity detector (VAD). The idea of using a non-linear transformation comes from some previous work done in speech linear prediction (LPC) field based in source separation techniques, where the score function was added into the classical equations in order to take into account the real distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if signal is clean, estimated entropy is essentially the same; but if signal is noisy transformed frames (with score function) are able to give different entropy if the frame is voiced against unvoiced ones. Experimental results show that this fact permits to detect voice activity under high noise, where simple entropy method fails.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ying, G.S., Mitchell, C.D., Jamieson, L.H.: Endpoint Detection of Isolated Utterances Based on a Modified Teager Energy Measurement. In: Proc. ICASSP II, pp. 732–735 (1993)

    Google Scholar 

  2. Shen, J.-l., Hung, J.-w., Lee, L.-s.: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments. In: Proc. ICSLP CD-ROM 1998 (1998)

    Google Scholar 

  3. Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/Non-Speech Classification Using Multiple Features For Robust Endpoint Detection. In: Proc. ICASSP, pp. 1399–1402 (2000)

    Google Scholar 

  4. Jia, C., Xu, B.: An improved Entropy-based endpoint detection algorithm. In: Proc. ICSLP (2002)

    Google Scholar 

  5. Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc. ICASSP (2000)

    Google Scholar 

  6. Van Gerven, S., Xie, F.: A Comparative study of speech detection methods. In: European Conference on Speech, Communication and Techonlogy (1997)

    Google Scholar 

  7. Hariharan, R., Häkkinen, J., Laurila, K.: Robust end-of-utterance detection for real-time speech recognition applications. In: Proc. ICASSP (2001)

    Google Scholar 

  8. Acero, A., Crespo, C., De la Torre, C., Torrecilla, J.: Robust HMM-based endpoint detector. In: Proc. ICASSP (1994)

    Google Scholar 

  9. Kosmides, E., Dermatas, E., Kokkinakis, G.: Stochastic endpoint detection in noisy speech. In: SPECOM Workshop, pp. 109–114 (1997)

    Google Scholar 

  10. Shen, J., Hung, J., Lee, L.: Robust entropybased endpoint detection for speech recognition in noisy environments. In: Proc. ICSLP, Sydney (1998)

    Google Scholar 

  11. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)

    MathSciNet  Google Scholar 

  12. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)

    Book  Google Scholar 

  13. Solé-Casals, J., Taleb, A., Jutten, C.: Parametric Approach to Blind Deconvolution of Nonlinear Channels. Neurocomputing 48, 339–355 (2002)

    Article  Google Scholar 

  14. Solé-Casals, J., Monte, E., Taleb, A., Jutten, C.: Source separation techniques applied to speech linear prediction. In: Proc. ICSLP (2000)

    Google Scholar 

  15. Härdle, W.: Smoothing Techniques with implementation in S. Springer, Heidelberg (1990)

    Google Scholar 

  16. ETSI standard doc., ETSI ES 201 108 V1.1.3 (2003-2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Solé-Casals, J., Martí-Puig, P., Reig-Bolaño, R., Zaiats, V. (2010). Score Function for Voice Activity Detection. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11509-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11508-0

  • Online ISBN: 978-3-642-11509-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics