Skip to main content

An Unsupervised Voice Activity Detection Using Time-Frequency Features

  • Conference paper
  • First Online:
Advances in Machine Intelligence and Computer Science Applications (ICMICSA 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 656))

  • 194 Accesses

Abstract

Voice Activity Detection (VAD) is a binary classification problem for separating speech segments from background silence or noise. Over time, many features for the VAD have been proposed. In our study, we applied two features: Short Time Energy (STE) and Spectral Centroid, which belong to temporal and frequency domains, respectively. The goal of applying this VAD method is to use the speech segment extracted as an input of a phonetic segmentation method of Arabic and Moroccan dialect speech signals. We evaluated the method on 400 sentences from the Arabphone corpus recorded in noisy and noiseless environments. The results showed promising accuracy of 85%, which is comparable to some other previously proposed methods.

Supported by LIPIM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bäckström, T.: Speech Coding: with Code-excited Linear Prediction. Springer, Berlin (2017)

    Book  Google Scholar 

  2. Lamel, L., et al.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech, Sig. Process. 29(4), 777–785 (1981)

    Article  Google Scholar 

  3. Haghani, S.K., Ahadi, S.M.: Robust voice activity detection using feature combination. In: 2013 21st Iranian Conference on Electrical Engineering (ICEE). IEEE (2013)

    Google Scholar 

  4. Liu, B., et al.: Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE (2014)

    Google Scholar 

  5. Elton, R.J., Mohanalin, J., Vasuki, P.: A novel voice activity detection algorithm using modified global thresholding. Int. J. Speech Technol. 24(1), 127–142 (2021). https://doi.org/10.1007/s10772-020-09777-w

    Article  Google Scholar 

  6. Sriskandaraja, K., et al.: A model based voice activity detector for noisy environments. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  7. Aneeja, G., Yegnanarayana, B.: Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Trans. Audio, Speech, Lang. Process. 23(4), 705–717 (2015)

    Article  Google Scholar 

  8. ETSI. Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels. (1999)

    Google Scholar 

  9. Ong, W.Q., Tan, A.W.C.: Robust voice activity detection using gammatone filtering and entropy. In: 2016 International Conference on Robotics, Automation and Sciences (ICORAS). IEEE (2016)

    Google Scholar 

  10. Morita, S., et al.: Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments. J. Sig. Process. Syst. 82(2), 163–173 (2016). https://doi.org/10.1007/s11265-015-1014-4

    Article  Google Scholar 

  11. Yang, X.K., et al.: Voice activity detection algorithm based on long-term pitch information. EURASIP J. Audio, Speech, Music Process. 2016(1), 1–9 (2016)

    Article  Google Scholar 

  12. Pang, J.: Spectrum energy based voice activity detection. In: 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC). IEEE (2017)

    Google Scholar 

  13. Mondal, S., Barman, A.D.: Speech activity detection using time-frequency auditory spectral pattern. Appl. Acoust. 167, 107403 (2020)

    Article  Google Scholar 

  14. Liu, F., Demosthenous, A.: A computation efficient voice activity detector for low signal-to-noise ratio in hearing aids. In: 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE (2021)

    Google Scholar 

  15. Esfandian, N., Jahani Bahnamiri, F., Mavaddati, S.: Voice activity detection using clustering-based method in Spectro-Temporal features space. J. AI and Data Min. 10, 401–409 (2022)

    Google Scholar 

  16. Wilkinson, N., Niesler, T.: A hybrid CNN-BiLSTM voice activity detector. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2021)

    Google Scholar 

  17. Li, S., et al.: Voice activity detection using a local-global attention model. Appl. Acoust. 195, 108802 (2022)

    Article  Google Scholar 

  18. Rho, D., Park, J., Ko, J.H.: NAS-VAD: neural architecture search for voice activity detection. arXiv preprint: arXiv:2201.09032 (2022)

  19. Giannakopoulos, T.: A method for silence removal and segmentation of speech signals, implemented in Matlab, vol. 2. University of Athens, Athens (2009)

    Google Scholar 

  20. Frihia, H., Bahi, H.: Embedded learning segmentation approach for Arabic speech recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 383–390. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_44

    Chapter  Google Scholar 

  21. Unjung, N.: Spectral Centroid (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hind Ait Mait .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ait Mait, H., Aboutabit, N. (2023). An Unsupervised Voice Activity Detection Using Time-Frequency Features. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_21

Download citation

Publish with us

Policies and ethics