Skip to main content

Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review

  • Conference paper
  • First Online:
Information and Communication Technology for Competitive Strategies (ICTCS 2020)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 190))

  • 1735 Accesses

Abstract

Voice Activity Detection (VAD) is a technique to classify speech signal into two parts as speech signal and background noises, and widely used in emerging speech recognition technologies such as mobile communication, high-quality multimedia transmission, forensic science, and voice recognition applications. As this technique is integral part of speech communication system, selection of precise VAD is the most challenging part in terms of complexity, feature extractions, threshold selection, and percentage of correctness. The researchers have generally classified VAD into supervised and unsupervised system and introduced various characteristics-based algorithm to reflect the occurrence of speech signal. However, a pervasive study is desired for the selection of appropriate techniques from predefined VAD along with the challenges and solutions to set the future research directions in the emerging area of voice recognition. Therefore, an extensive study is presented in this manuscript especially to set a tradeoff between obstacles and performance of earlier developed VAD. The authors believe that this review will be helpful to researchers working in the challenging speech processing and recognition domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. I. Mc Cowan, D. Dean, M. McLaren, R. Vogt, S. Sridharan, The delta phase spectrum with application to voice activity detection and speaker recognition. IEEE Trans. Audio Speech Lang. Proc. 19, 2026–2038 (2011)

    Google Scholar 

  2. D. Valj, B. Kotnik, B. Horvat, Z. Kacic, A computationally efficient mel filter bank VAD algorithm for distributed speech recognition systems. Eurasip J. Appl. Sig. Process. 4, 487–497 (2005)

    MATH  Google Scholar 

  3. B. Kotnik, Z. Kacic, B. Horvat, A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm, in Proceedings of 7th Europseech (2001), pp. 197–200

    Google Scholar 

  4. T. Kristjansson, S. Deligne, P. Olsen, Voicing features for robust speech detection, in Proceedings of Interspeech (2005), pp. 369–372

    Google Scholar 

  5. J. Haigh, J. Mason, A voice activity detector based on Cepstral analysis, in Proceedings of Eurospeech (2003), pp. 1103–1106

    Google Scholar 

  6. S.O. Sadjadi, J. Hansen, Unsupervised speech activity detection using voicing measures and perceptual spectral flux. IEEE Sig. Pro. Lett. 20, 197–200 (2013)

    Article  Google Scholar 

  7. M. Marzinzik, B. Kollmeier, Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Trans. Speech Audio Process. 10, 109–118 (2002)

    Article  Google Scholar 

  8. E. Nemer, R. Goubran, S. Mahmoud, Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech Audio Process. 9, 217–231 (2001)

    Article  Google Scholar 

  9. K. Ishizuka, T. Nakatani, Study of noise robust voice activity detection based on periodic component to aperiodic component ratio, in Proceedings of ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (2006), p. 6570

    Google Scholar 

  10. J. Ramirez, J. Segura, M. Benitez, L. Garcia, A. Rubio, Statistical voice activity detection using a multiple observation likelihood ratio test. IEEE Sig. Proc. Lett. 12, 689–692 (2005)

    Article  Google Scholar 

  11. P. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19, 600–613 (2011)

    Article  Google Scholar 

  12. Y. Kida, T. Kawahara, Voice activity detection based on optimally weighted combination of multiple features, in Proceedings of Interspeech (2005), pp. 2621–2624

    Google Scholar 

  13. S. Soleimani, S. Ahadi, Voice activity detection based on combination of multiple features using linear/kernel discriminant analyses, in Proceedings of Information and Communication Technologies: From Theory to Applications (2008), pp. 1–5

    Google Scholar 

  14. H. Singh, A.K. Bathla, A survey on speech recognition. Int. J. Adv. Res. Comput. Eng. Technol. 2(6), 2186–2189 (2013)

    Google Scholar 

  15. M.A. Anusuya, S.K. Katti, Speech recognition by machine: a review. Int. J. Comput. Sci. Inf. Secur. 6(3), 181–205 (2009)

    Google Scholar 

  16. J. Padmanabhan, M.J.J. Premkumar, Machine learning in automatic speech recognition: A survey. IETE Tech. Rev. 32(4), 240–251 (2015)

    Article  Google Scholar 

  17. C.-C. Shen, W. Plishker, S.S. Bhattacharyya, Design and optimization of a distributed, embedded speech recognition system, in Proceedings of the International Workshop on Parallel and Distributed Real-Time Systems, Miami, Florida, April 2008

    Google Scholar 

  18. G. Zhou, J.H.L. Hansen, J.F. Kaiser, Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)

    Article  Google Scholar 

  19. C. Fredouille, G. Pouchoulin, J.-F. Bonastre, M. Azzarello, A. Giovanni, A. Ghio, Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia), in Proceedings of 9th European Conference on Speech Communication and Technology (Eurospeech) (2005), pp. 149–152

    Google Scholar 

  20. V.A. Petrushin, Emotion recognition in speech signal: experimental study, development, and application, in Proceedings of Sixth International Conference on Spoken Language Processing (ICSLP) (2000), p. 5

    Google Scholar 

  21. N. Fragopanagos, J.G. Taylor, Emotion recognition in human–computer interaction. Neural Netw. 18(4), 389–405 (2005)

    Article  Google Scholar 

  22. E. Douglas-Cowie, N. Campbell, R. Cowie, P. Roach, Emotional speech: towards a new generation of databases. Speech Commun. 40(1–2), 33–60 (2003)

    Article  Google Scholar 

  23. B. Kingsbury, G. Saon, L. Mangu, M. Padmanabhan, R. Sarikaya, Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system. Proc. ICASSP 1, 53–56 (2002)

    Google Scholar 

  24. ETSI standard document, ETSI ES 202 050 V 1.1.3. (2003)

    Google Scholar 

  25. K. Li, N.S. Swamy, M.O. Ahmad, An improved voice activity detection using higher order statistics. IEEE Trans. Speech Audio Process. 13, 965–974 (2005)

    Article  Google Scholar 

  26. G.D. Wuand, C.T. Lin, Word boundary detection with MEL scale frequency bank in noisy environment. IEEE Trans. Speech Audio Process. (2000)

    Google Scholar 

  27. A. Lee, K. Nakamura, R. Nisimura, H. Saruwatari, K. Shikano, Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs, in Interspeech (2004), pp. 173–176

    Google Scholar 

  28. B. Lee, M. Hasegawa-Johnson, Minimum mean squared error a posteriori estimation of high variance vehicular noise, in Proceedings of Biennial on DSP for In-Vehicle and Mobile Systems, Istanbul, Turkey, June 2007

    Google Scholar 

  29. ETSI ES 202 050 Recommendation, Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms (2002)

    Google Scholar 

  30. C.F. Juang, C.N. Cheng, T.M. Chen, Speech detection in noisy environments by wavelet energy-based recurrent neural fuzzy network. Exp. Syst. Appl. 36(1), 321–332 (2009)

    Article  Google Scholar 

  31. K.C. Wang, Y.H. Tasi, Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy, in Second International Symposium on Universal Communication (ISUC’08) (2008), pp. 423–428

    Google Scholar 

  32. S.K. Kim, S.I. Kang, Y.J. Park, S. Lee, S. Lee, Power spectral deviation-based voice activity detection incorporating teager energy for speech enhancement. Symmetry 8(7), 58 (2016)

    Article  MathSciNet  Google Scholar 

  33. F.G. Germain, D.L. Sun, G.J. Mysore, Speaker and noise independent voice activity detection, in Interspeech (2013), pp. 732–736

    Google Scholar 

  34. T. Kinnunen, P. Rajan, A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data, in ICASSP (2013), pp. 7229–7233

    Google Scholar 

  35. I. Ariav, I. Cohen, An end-to-end multimodal voice activity detection using wavenet encoder and residual networks. IEEE J. Sel. Topics Sig. Process. 13(2), 265–274 (2019)

    Article  Google Scholar 

  36. A. Ivry, B. Berdugo, I. Cohen, Voice activity detection for transient noisy environment based on diffusion nets. IEEE J. Sel. Topics Sig. Process. 13(2), 254–264 (2019)

    Article  Google Scholar 

  37. H. Dubey, A. Sangwan, J.H. Hansen, Leveraging frequency dependent kernel and dip-based clustering for robust speech activity detection in naturalistic audio streams. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 2056–2071 (2018)

    Article  Google Scholar 

  38. G.-B. Wang, W.-Q. Zhang, An RNN and CRNN based approach to robust voice activity detection (2019). https://doi.org/10.1109/apsipaasc47483.2019.9023320

  39. Available online http://www.alango.com/voice-activity-detection.php

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, S., Rattan, P., Sharma, A. (2021). Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review. In: Kaiser, M.S., Xie, J., Rathore, V.S. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2020). Lecture Notes in Networks and Systems, vol 190. Springer, Singapore. https://doi.org/10.1007/978-981-16-0882-7_39

Download citation

Publish with us

Policies and ethics