Score Function for Voice Activity Detection

Solé-Casals, Jordi; Martí-Puig, Pere; Reig-Bolaño, Ramon; Zaiats, Vladimir

doi:10.1007/978-3-642-11509-7_10

Score Function for Voice Activity Detection

Jordi Solé-Casals²¹,
Pere Martí-Puig²¹,
Ramon Reig-Bolaño²¹ &
…
Vladimir Zaiats²¹

Conference paper

578 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Abstract

In this paper we explore the use of non-linear transformations in order to improve the performance of an entropy based voice activity detector (VAD). The idea of using a non-linear transformation comes from some previous work done in speech linear prediction (LPC) field based in source separation techniques, where the score function was added into the classical equations in order to take into account the real distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if signal is clean, estimated entropy is essentially the same; but if signal is noisy transformed frames (with score function) are able to give different entropy if the frame is voiced against unvoiced ones. Experimental results show that this fact permits to detect voice activity under high noise, where simple entropy method fails.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ying, G.S., Mitchell, C.D., Jamieson, L.H.: Endpoint Detection of Isolated Utterances Based on a Modified Teager Energy Measurement. In: Proc. ICASSP II, pp. 732–735 (1993)
Google Scholar
Shen, J.-l., Hung, J.-w., Lee, L.-s.: Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments. In: Proc. ICSLP CD-ROM 1998 (1998)
Google Scholar
Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/Non-Speech Classification Using Multiple Features For Robust Endpoint Detection. In: Proc. ICASSP, pp. 1399–1402 (2000)
Google Scholar
Jia, C., Xu, B.: An improved Entropy-based endpoint detection algorithm. In: Proc. ICSLP (2002)
Google Scholar
Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc. ICASSP (2000)
Google Scholar
Van Gerven, S., Xie, F.: A Comparative study of speech detection methods. In: European Conference on Speech, Communication and Techonlogy (1997)
Google Scholar
Hariharan, R., Häkkinen, J., Laurila, K.: Robust end-of-utterance detection for real-time speech recognition applications. In: Proc. ICASSP (2001)
Google Scholar
Acero, A., Crespo, C., De la Torre, C., Torrecilla, J.: Robust HMM-based endpoint detector. In: Proc. ICASSP (1994)
Google Scholar
Kosmides, E., Dermatas, E., Kokkinakis, G.: Stochastic endpoint detection in noisy speech. In: SPECOM Workshop, pp. 109–114 (1997)
Google Scholar
Shen, J., Hung, J., Lee, L.: Robust entropybased endpoint detection for speech recognition in noisy environments. In: Proc. ICSLP, Sydney (1998)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)
MathSciNet Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)
Book Google Scholar
Solé-Casals, J., Taleb, A., Jutten, C.: Parametric Approach to Blind Deconvolution of Nonlinear Channels. Neurocomputing 48, 339–355 (2002)
Article Google Scholar
Solé-Casals, J., Monte, E., Taleb, A., Jutten, C.: Source separation techniques applied to speech linear prediction. In: Proc. ICSLP (2000)
Google Scholar
Härdle, W.: Smoothing Techniques with implementation in S. Springer, Heidelberg (1990)
Google Scholar
ETSI standard doc., ETSI ES 201 108 V1.1.3 (2003-2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Technologies Group, University of Vic, Sagrada Família 7, 08500, Vic, Spain
Jordi Solé-Casals, Pere Martí-Puig, Ramon Reig-Bolaño & Vladimir Zaiats

Authors

Jordi Solé-Casals
View author publications
You can also search for this author in PubMed Google Scholar
Pere Martí-Puig
View author publications
You can also search for this author in PubMed Google Scholar
Ramon Reig-Bolaño
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Zaiats
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Escola Politecnica Superior, Universidat de Vic, c/. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Jordi Solé-Casals
Department of Computer Science, Escola Politecnica Superior, Universitat de Vic, c./. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Vladimir Zaiats

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Solé-Casals, J., Martí-Puig, P., Reig-Bolaño, R., Zaiats, V. (2010). Score Function for Voice Activity Detection. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-11509-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11508-0
Online ISBN: 978-3-642-11509-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics