Abstract
Voice Activity Detection (VAD) is a binary classification problem for separating speech segments from background silence or noise. Over time, many features for the VAD have been proposed. In our study, we applied two features: Short Time Energy (STE) and Spectral Centroid, which belong to temporal and frequency domains, respectively. The goal of applying this VAD method is to use the speech segment extracted as an input of a phonetic segmentation method of Arabic and Moroccan dialect speech signals. We evaluated the method on 400 sentences from the Arabphone corpus recorded in noisy and noiseless environments. The results showed promising accuracy of 85%, which is comparable to some other previously proposed methods.
Supported by LIPIM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bäckström, T.: Speech Coding: with Code-excited Linear Prediction. Springer, Berlin (2017)
Lamel, L., et al.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech, Sig. Process. 29(4), 777–785 (1981)
Haghani, S.K., Ahadi, S.M.: Robust voice activity detection using feature combination. In: 2013 21st Iranian Conference on Electrical Engineering (ICEE). IEEE (2013)
Liu, B., et al.: Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE (2014)
Elton, R.J., Mohanalin, J., Vasuki, P.: A novel voice activity detection algorithm using modified global thresholding. Int. J. Speech Technol. 24(1), 127–142 (2021). https://doi.org/10.1007/s10772-020-09777-w
Sriskandaraja, K., et al.: A model based voice activity detector for noisy environments. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Aneeja, G., Yegnanarayana, B.: Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(4), 705–717 (2015)
ETSI. Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels. (1999)
Ong, W.Q., Tan, A.W.C.: Robust voice activity detection using gammatone filtering and entropy. In: 2016 International Conference on Robotics, Automation and Sciences (ICORAS). IEEE (2016)
Morita, S., et al.: Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments. J. Sig. Process. Syst. 82(2), 163–173 (2016). https://doi.org/10.1007/s11265-015-1014-4
Yang, X.K., et al.: Voice activity detection algorithm based on long-term pitch information. EURASIP J. Audio, Speech, Music Process. 2016(1), 1–9 (2016)
Pang, J.: Spectrum energy based voice activity detection. In: 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC). IEEE (2017)
Mondal, S., Barman, A.D.: Speech activity detection using time-frequency auditory spectral pattern. Appl. Acoust. 167, 107403 (2020)
Liu, F., Demosthenous, A.: A computation efficient voice activity detector for low signal-to-noise ratio in hearing aids. In: 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE (2021)
Esfandian, N., Jahani Bahnamiri, F., Mavaddati, S.: Voice activity detection using clustering-based method in Spectro-Temporal features space. J. AI and Data Min. 10, 401–409 (2022)
Wilkinson, N., Niesler, T.: A hybrid CNN-BiLSTM voice activity detector. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2021)
Li, S., et al.: Voice activity detection using a local-global attention model. Appl. Acoust. 195, 108802 (2022)
Rho, D., Park, J., Ko, J.H.: NAS-VAD: neural architecture search for voice activity detection. arXiv preprint: arXiv:2201.09032 (2022)
Giannakopoulos, T.: A method for silence removal and segmentation of speech signals, implemented in Matlab, vol. 2. University of Athens, Athens (2009)
Frihia, H., Bahi, H.: Embedded learning segmentation approach for Arabic speech recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 383–390. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_44
Unjung, N.: Spectral Centroid (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ait Mait, H., Aboutabit, N. (2023). An Unsupervised Voice Activity Detection Using Time-Frequency Features. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-29313-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28845-6
Online ISBN: 978-3-031-29313-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)