Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review

Sharma, Shilpa; Rattan, Punam; Sharma, Anurag

doi:10.1007/978-981-16-0882-7_39

Shilpa Sharma¹²,
Punam Rattan¹³ &
Anurag Sharma¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 190))

1735 Accesses

Abstract

Voice Activity Detection (VAD) is a technique to classify speech signal into two parts as speech signal and background noises, and widely used in emerging speech recognition technologies such as mobile communication, high-quality multimedia transmission, forensic science, and voice recognition applications. As this technique is integral part of speech communication system, selection of precise VAD is the most challenging part in terms of complexity, feature extractions, threshold selection, and percentage of correctness. The researchers have generally classified VAD into supervised and unsupervised system and introduced various characteristics-based algorithm to reflect the occurrence of speech signal. However, a pervasive study is desired for the selection of appropriate techniques from predefined VAD along with the challenges and solutions to set the future research directions in the emerging area of voice recognition. Therefore, an extensive study is presented in this manuscript especially to set a tradeoff between obstacles and performance of earlier developed VAD. The authors believe that this review will be helpful to researchers working in the challenging speech processing and recognition domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

I. Mc Cowan, D. Dean, M. McLaren, R. Vogt, S. Sridharan, The delta phase spectrum with application to voice activity detection and speaker recognition. IEEE Trans. Audio Speech Lang. Proc. 19, 2026–2038 (2011)
Google Scholar
D. Valj, B. Kotnik, B. Horvat, Z. Kacic, A computationally efficient mel filter bank VAD algorithm for distributed speech recognition systems. Eurasip J. Appl. Sig. Process. 4, 487–497 (2005)
MATH Google Scholar
B. Kotnik, Z. Kacic, B. Horvat, A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm, in Proceedings of 7th Europseech (2001), pp. 197–200
Google Scholar
T. Kristjansson, S. Deligne, P. Olsen, Voicing features for robust speech detection, in Proceedings of Interspeech (2005), pp. 369–372
Google Scholar
J. Haigh, J. Mason, A voice activity detector based on Cepstral analysis, in Proceedings of Eurospeech (2003), pp. 1103–1106
Google Scholar
S.O. Sadjadi, J. Hansen, Unsupervised speech activity detection using voicing measures and perceptual spectral flux. IEEE Sig. Pro. Lett. 20, 197–200 (2013)
Article Google Scholar
M. Marzinzik, B. Kollmeier, Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Trans. Speech Audio Process. 10, 109–118 (2002)
Article Google Scholar
E. Nemer, R. Goubran, S. Mahmoud, Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech Audio Process. 9, 217–231 (2001)
Article Google Scholar
K. Ishizuka, T. Nakatani, Study of noise robust voice activity detection based on periodic component to aperiodic component ratio, in Proceedings of ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (2006), p. 6570
Google Scholar
J. Ramirez, J. Segura, M. Benitez, L. Garcia, A. Rubio, Statistical voice activity detection using a multiple observation likelihood ratio test. IEEE Sig. Proc. Lett. 12, 689–692 (2005)
Article Google Scholar
P. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19, 600–613 (2011)
Article Google Scholar
Y. Kida, T. Kawahara, Voice activity detection based on optimally weighted combination of multiple features, in Proceedings of Interspeech (2005), pp. 2621–2624
Google Scholar
S. Soleimani, S. Ahadi, Voice activity detection based on combination of multiple features using linear/kernel discriminant analyses, in Proceedings of Information and Communication Technologies: From Theory to Applications (2008), pp. 1–5
Google Scholar
H. Singh, A.K. Bathla, A survey on speech recognition. Int. J. Adv. Res. Comput. Eng. Technol. 2(6), 2186–2189 (2013)
Google Scholar
M.A. Anusuya, S.K. Katti, Speech recognition by machine: a review. Int. J. Comput. Sci. Inf. Secur. 6(3), 181–205 (2009)
Google Scholar
J. Padmanabhan, M.J.J. Premkumar, Machine learning in automatic speech recognition: A survey. IETE Tech. Rev. 32(4), 240–251 (2015)
Article Google Scholar
C.-C. Shen, W. Plishker, S.S. Bhattacharyya, Design and optimization of a distributed, embedded speech recognition system, in Proceedings of the International Workshop on Parallel and Distributed Real-Time Systems, Miami, Florida, April 2008
Google Scholar
G. Zhou, J.H.L. Hansen, J.F. Kaiser, Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)
Article Google Scholar
C. Fredouille, G. Pouchoulin, J.-F. Bonastre, M. Azzarello, A. Giovanni, A. Ghio, Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia), in Proceedings of 9th European Conference on Speech Communication and Technology (Eurospeech) (2005), pp. 149–152
Google Scholar
V.A. Petrushin, Emotion recognition in speech signal: experimental study, development, and application, in Proceedings of Sixth International Conference on Spoken Language Processing (ICSLP) (2000), p. 5
Google Scholar
N. Fragopanagos, J.G. Taylor, Emotion recognition in human–computer interaction. Neural Netw. 18(4), 389–405 (2005)
Article Google Scholar
E. Douglas-Cowie, N. Campbell, R. Cowie, P. Roach, Emotional speech: towards a new generation of databases. Speech Commun. 40(1–2), 33–60 (2003)
Article Google Scholar
B. Kingsbury, G. Saon, L. Mangu, M. Padmanabhan, R. Sarikaya, Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system. Proc. ICASSP 1, 53–56 (2002)
Google Scholar
ETSI standard document, ETSI ES 202 050 V 1.1.3. (2003)
Google Scholar
K. Li, N.S. Swamy, M.O. Ahmad, An improved voice activity detection using higher order statistics. IEEE Trans. Speech Audio Process. 13, 965–974 (2005)
Article Google Scholar
G.D. Wuand, C.T. Lin, Word boundary detection with MEL scale frequency bank in noisy environment. IEEE Trans. Speech Audio Process. (2000)
Google Scholar
A. Lee, K. Nakamura, R. Nisimura, H. Saruwatari, K. Shikano, Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs, in Interspeech (2004), pp. 173–176
Google Scholar
B. Lee, M. Hasegawa-Johnson, Minimum mean squared error a posteriori estimation of high variance vehicular noise, in Proceedings of Biennial on DSP for In-Vehicle and Mobile Systems, Istanbul, Turkey, June 2007
Google Scholar
ETSI ES 202 050 Recommendation, Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms (2002)
Google Scholar
C.F. Juang, C.N. Cheng, T.M. Chen, Speech detection in noisy environments by wavelet energy-based recurrent neural fuzzy network. Exp. Syst. Appl. 36(1), 321–332 (2009)
Article Google Scholar
K.C. Wang, Y.H. Tasi, Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy, in Second International Symposium on Universal Communication (ISUC’08) (2008), pp. 423–428
Google Scholar
S.K. Kim, S.I. Kang, Y.J. Park, S. Lee, S. Lee, Power spectral deviation-based voice activity detection incorporating teager energy for speech enhancement. Symmetry 8(7), 58 (2016)
Article MathSciNet Google Scholar
F.G. Germain, D.L. Sun, G.J. Mysore, Speaker and noise independent voice activity detection, in Interspeech (2013), pp. 732–736
Google Scholar
T. Kinnunen, P. Rajan, A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data, in ICASSP (2013), pp. 7229–7233
Google Scholar
I. Ariav, I. Cohen, An end-to-end multimodal voice activity detection using wavenet encoder and residual networks. IEEE J. Sel. Topics Sig. Process. 13(2), 265–274 (2019)
Article Google Scholar
A. Ivry, B. Berdugo, I. Cohen, Voice activity detection for transient noisy environment based on diffusion nets. IEEE J. Sel. Topics Sig. Process. 13(2), 254–264 (2019)
Article Google Scholar
H. Dubey, A. Sangwan, J.H. Hansen, Leveraging frequency dependent kernel and dip-based clustering for robust speech activity detection in naturalistic audio streams. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 2056–2071 (2018)
Article Google Scholar
G.-B. Wang, W.-Q. Zhang, An RNN and CRNN based approach to robust voice activity detection (2019). https://doi.org/10.1109/apsipaasc47483.2019.9023320
Available online http://www.alango.com/voice-activity-detection.php

Download references

Author information

Authors and Affiliations

Department of CSE, CT University, Ludhiana, India
Shilpa Sharma
Department of Computer Application, CT University, Ludhiana, India
Punam Rattan
Department of Computer Science and Engineering, GNA University, Phagwara, India
Anurag Sharma

Authors

Shilpa Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Punam Rattan
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jahangirnagar University, Dhaka, Bangladesh
M. Shamim Kaiser
Shaanxi Normal University, Xi’an, China
Juanying Xie
IIS Deemed to be University, Jaipur, Rajasthan, India
Vijay Singh Rathore

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, S., Rattan, P., Sharma, A. (2021). Recent Developments, Challenges, and Future Scope of Voice Activity Detection Schemes—A Review. In: Kaiser, M.S., Xie, J., Rathore, V.S. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2020). Lecture Notes in Networks and Systems, vol 190. Springer, Singapore. https://doi.org/10.1007/978-981-16-0882-7_39

Download citation

DOI: https://doi.org/10.1007/978-981-16-0882-7_39
Published: 06 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0881-0
Online ISBN: 978-981-16-0882-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics