An Unsupervised Voice Activity Detection Using Time-Frequency Features

Ait Mait, Hind; Aboutabit, Noureddine

doi:10.1007/978-3-031-29313-9_21

Hind Ait Mait¹² &
Noureddine Aboutabit¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 656))

Included in the following conference series:

International Conference of Machine Learning and Computer Science Applications

194 Accesses

Abstract

Voice Activity Detection (VAD) is a binary classification problem for separating speech segments from background silence or noise. Over time, many features for the VAD have been proposed. In our study, we applied two features: Short Time Energy (STE) and Spectral Centroid, which belong to temporal and frequency domains, respectively. The goal of applying this VAD method is to use the speech segment extracted as an input of a phonetic segmentation method of Arabic and Moroccan dialect speech signals. We evaluated the method on 400 sentences from the Arabphone corpus recorded in noisy and noiseless environments. The results showed promising accuracy of 85%, which is comparable to some other previously proposed methods.

Supported by LIPIM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bäckström, T.: Speech Coding: with Code-excited Linear Prediction. Springer, Berlin (2017)
Book Google Scholar
Lamel, L., et al.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech, Sig. Process. 29(4), 777–785 (1981)
Article Google Scholar
Haghani, S.K., Ahadi, S.M.: Robust voice activity detection using feature combination. In: 2013 21st Iranian Conference on Electrical Engineering (ICEE). IEEE (2013)
Google Scholar
Liu, B., et al.: Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE (2014)
Google Scholar
Elton, R.J., Mohanalin, J., Vasuki, P.: A novel voice activity detection algorithm using modified global thresholding. Int. J. Speech Technol. 24(1), 127–142 (2021). https://doi.org/10.1007/s10772-020-09777-w
Article Google Scholar
Sriskandaraja, K., et al.: A model based voice activity detector for noisy environments. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Aneeja, G., Yegnanarayana, B.: Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(4), 705–717 (2015)
Article Google Scholar
ETSI. Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels. (1999)
Google Scholar
Ong, W.Q., Tan, A.W.C.: Robust voice activity detection using gammatone filtering and entropy. In: 2016 International Conference on Robotics, Automation and Sciences (ICORAS). IEEE (2016)
Google Scholar
Morita, S., et al.: Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments. J. Sig. Process. Syst. 82(2), 163–173 (2016). https://doi.org/10.1007/s11265-015-1014-4
Article Google Scholar
Yang, X.K., et al.: Voice activity detection algorithm based on long-term pitch information. EURASIP J. Audio, Speech, Music Process. 2016(1), 1–9 (2016)
Article Google Scholar
Pang, J.: Spectrum energy based voice activity detection. In: 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC). IEEE (2017)
Google Scholar
Mondal, S., Barman, A.D.: Speech activity detection using time-frequency auditory spectral pattern. Appl. Acoust. 167, 107403 (2020)
Article Google Scholar
Liu, F., Demosthenous, A.: A computation efficient voice activity detector for low signal-to-noise ratio in hearing aids. In: 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE (2021)
Google Scholar
Esfandian, N., Jahani Bahnamiri, F., Mavaddati, S.: Voice activity detection using clustering-based method in Spectro-Temporal features space. J. AI and Data Min. 10, 401–409 (2022)
Google Scholar
Wilkinson, N., Niesler, T.: A hybrid CNN-BiLSTM voice activity detector. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2021)
Google Scholar
Li, S., et al.: Voice activity detection using a local-global attention model. Appl. Acoust. 195, 108802 (2022)
Article Google Scholar
Rho, D., Park, J., Ko, J.H.: NAS-VAD: neural architecture search for voice activity detection. arXiv preprint: arXiv:2201.09032 (2022)
Giannakopoulos, T.: A method for silence removal and segmentation of speech signals, implemented in Matlab, vol. 2. University of Athens, Athens (2009)
Google Scholar
Frihia, H., Bahi, H.: Embedded learning segmentation approach for Arabic speech recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 383–390. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_44
Chapter Google Scholar
Unjung, N.: Spectral Centroid (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Process Engineering, Computer Science and Mathematics, National School of Applied Sciences Khouribga, University Sultan Moulay Slimane, Beni Mellal, Morocco
Hind Ait Mait & Noureddine Aboutabit

Authors

Hind Ait Mait
View author publications
You can also search for this author in PubMed Google Scholar
Noureddine Aboutabit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hind Ait Mait .

Editor information

Editors and Affiliations

National School of Applied Sciences of Khouribga, Sultan Moulay Slimane University, Khouribga, Morocco
Noureddine Aboutabit
ENSIAS, Mohammed V University, Rabat, Morocco
Mohamed Lazaar
National School of Applied Sciences of Khouribga, Sultan Moulay Slimane University, Khouribga, Morocco
Imad Hafidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ait Mait, H., Aboutabit, N. (2023). An Unsupervised Voice Activity Detection Using Time-Frequency Features. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-29313-9_21
Published: 07 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28845-6
Online ISBN: 978-3-031-29313-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics