Exploiting Contextual Information for Speech/Non-Speech Detection

Krishnan Parthasarathi, Sree Hari; Motlíček, Petr; Hermansky, Hynek

doi:10.1007/978-3-540-87391-4_58

Sree Hari Krishnan Parthasarathi¹,
Petr Motlíček¹ &
Hynek Hermansky¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

950 Accesses

Abstract

In this paper, we investigate the effect of temporal context for speech/ non-speech detection (SND). It is shown that even a simple feature such as full-band energy, when employed with a large-enough context, shows promise for further investigation. Experimental evaluations on the test data set, with a state-of-the-art multi-layer perceptron based SND system and a simple energy threshold based SND method, using the F-measure, show an absolute performance gain of 4.4% and 5.4% respectively. The optimal contextual length was found to be 1000 ms. Further numerical optimizations yield an improvement (3.37% absolute), resulting in an absolute gain of 7.77% and 8.77% over the MLP based and energy based methods respectively. ROC based performance evaluation also reveals promising performance for the proposed method, particularly in low SNR conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atal, B.S., Rabiner, L.R.: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Trans. on Acoust., Speech and Signal Process (1976)
Google Scholar
Dines, J., Vepa, J., Hain, T.: The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Int. Conf. on Spoken Language Processing (Interspeech ICSLP), Pittsburgh, USA, pp. 1213–1216 (2006)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
Google Scholar
Garofolo, J.S., Laprun, C.D., Michel, M., Stanford, V.M., Tabassi, E.: The NIST meeting room pilot corpus (2004)
Google Scholar
Maganti, H.K., Motlicek, P., Perez, D.G.: Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (2007)
Google Scholar
Mesgarani, N., Slaney, M., Shamma, S.A.: Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations. IEEE Transactions on Audio, Speech and Language Processing 14, 920–930 (2006)
Article Google Scholar
Varga, A.P., Steeneken, H.J.M., Tomlinson, M., Jones, D.: The noisex-92 study on the effect of additive noise on automatic speech recognition. Tech. Report DRA Speech Research Unit (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, Martigny Swiss Federal Institute of Technology at Lausanne (EPFL), Switzerland
Sree Hari Krishnan Parthasarathi, Petr Motlíček & Hynek Hermansky

Authors

Sree Hari Krishnan Parthasarathi
View author publications
You can also search for this author in PubMed Google Scholar
Petr Motlíček
View author publications
You can also search for this author in PubMed Google Scholar
Hynek Hermansky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krishnan Parthasarathi, S.H., Motlíček, P., Hermansky, H. (2008). Exploiting Contextual Information for Speech/Non-Speech Detection. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_58

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting Contextual Information for Speech/Non-Speech Detection