Skip to main content

Bio-inspired voice activity detector based on the human speech properties in the modulation domain

  • Conference paper

Abstract

In many conventional voice activity detection (VAD) methods, a speech signal is assumed to be acquired in high quality. This paper describes a method of robust voice activity detection, which deals with speech signal in noise environment. The proposed VAD scheme explores the properties of modulation spectrum of human speech. Speech signal is split into frequency bands and filtered in the modulation frequency domain for noise level pre-reducing. Then, spectrum energy evaluation is performed and noise threshold is calculated. The proposed method provides robust speech detection in a varying noise environment. It can be used in speech enhancement and speech coding algorithms. Characteristics of the proposed method were investigated with different types of noisy speech.

This work was supported in part by Bialystok Technical University under the grant W/WI/2/04

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

6 References

  1. Sovka P., Polak P. The Study of Speech/Pause Detectors for Speech Enhancement Methods // Proceeding of the 4th European Conference on Speech Communication and Technology, Madrid, Spain, September 1995, pp. 1575–1578.

    Google Scholar 

  2. Borowicz A., Petrovsky A. The Comparative Study of Voice Activity Detectors // Иэвестия Белорусской инженерной академии, №2(14)/l, 2002, с. 148–152.

    Google Scholar 

  3. Puder H., Soffke O. An Approach to an Optimized Voice-Activity Detector for Noisy Speech Signals // Proceeding of the XI European Signal Processing Conference, Toulouse, France, 03–06 September 2002, Vol I, pp 243–246.

    Google Scholar 

  4. Hioka Y., Hamada N. Voice Activity Detection with Array Signal Processing in the Wavelet Domain // Proceeding of the XI European Signal Processing Conference, Toulouse, France, 03–06 September 2002, Vol. I, pp 255–258.

    Google Scholar 

  5. Rosca J., Balan R., Fan N.P. and e.t. Multichannel Voice Detection in Adverse Environments // Proceeding of the XI European Signal Processing Conference, Toulouse, France, 03–06 September 2002, Vol. I, pp 251–254

    Google Scholar 

  6. Special Issue “Neuromorphic Signal Processing and Implementations” edited by S.A. Shamma and A. Schaik “, EURASIP Journal on Applied Signal Processing, 7 (2003), June (2003).

    Google Scholar 

  7. Elhilali M., Chi T., Shamma S. A Spectro-temporal modulation index (STMI) for assessment of speech intelligibility // Speech Communication — 2003 — 41 — pp. 331–348.

    Article  Google Scholar 

  8. Hermansky H., Morgan N. “RASTA processing of speech”, IEEE Transactions on speech and audio processing 4(2) October (1994), pp. 578–589.

    Article  Google Scholar 

  9. Drullman R., Festen J.M., Plomp R. Effect of temporal envelope smearing on speech reception // J. Acoust. Soc. Am. — 1994 — №2(95) — pp 1053–1064.

    Article  Google Scholar 

  10. Arai T., Pavel M., Hermansky H., Avendano C. Syllable intelligibility for temporally filtered LPC cepstral trajectories // J. Acoust. Soc, Am. — 1999 — vol. 105 — pp 2783–2791.

    Article  Google Scholar 

  11. Kusumoto A., Arai T., Kitamura T., Takahashi M., Murahara Y., ‘Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired // Proc. of the ICASSP, Vol. 2, pp. 853–856, Istanbul, 2000.

    Google Scholar 

  12. Avendano C, Temporal processing of speech in a Time-Feature Space, Ph.D. thesis, Oregon Graduate Institute, Portland, OR, Apr., 1997.

    Google Scholar 

  13. Shadevsky A., Baszun J., Petrovsky A. Noise reduction based on neuromorphic speech signal processing. — Structures-waves-human health: acoustical engineering. Editor R. Panuszka. — vol. XIII, No 1, Krakow 2004. — pp. 115–122

    Google Scholar 

  14. Shadevsky A., Petrovsky A. Voice activity detector based on human speech modulatuion spectrum exploitation // Proceeding of the 6th International Conference and Exhibition “Digital Signal Processing and its Applications”, Moscow, Russia, 31 March–2 April 2004, Vol. I, pp. 167–180, in russian.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this paper

Cite this paper

Shadevsky, A., Petrovsky, A. (2005). Bio-inspired voice activity detector based on the human speech properties in the modulation domain. In: Saeed, K., Pejaś, J. (eds) Information Processing and Security Systems. Springer, Boston, MA. https://doi.org/10.1007/0-387-26325-X_5

Download citation

  • DOI: https://doi.org/10.1007/0-387-26325-X_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-25091-5

  • Online ISBN: 978-0-387-26325-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics