Advertisement

International Journal of Speech Technology

, Volume 21, Issue 4, pp 753–760 | Cite as

Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal

  • Himadri Mukherjee
  • Sk. Md. Obaidullah
  • K. C. Santosh
  • Santanu Phadikar
  • Kaushik Roy
Article

Abstract

Voice activity detection (VAD) refers to the task of identifying vocal segments from an audio clip. It helps in reducing the computational overhead as well elevate the recognition performance of speech-based systems by helping to discard the non vocal portions from an input signal. In this paper, a VAD technique is presented that uses line spectral frequency-based statistical features namely LSF-S coupled with extreme learning-based classification. The experiments were performed on a database of more than 350 h consisting of data from multifarious sources. We have obtained an encouraging overall accuracy of 99.43%.

Keywords

Voice activity detection Line spectral frequency Extreme learning machine 

Notes

Acknowledgements

The authors wish to thank Dr. Chayan Halder of University of Engineering and Management, Kolkata, Miss Payel Rakshit of Maheshtala College, Budge Budge and Miss Ankita Dhar of West Bengal State University, Barasat for extending a helping hand as and when required during the entire span of this work. They would also like to thank Mr. Debajyoti Bose of University of Petroleum and Energy Studies, Dehradun for his help.

References

  1. Asbai, N., Bengherabi, M., Amrouche, A., & Aklouf, Y. (2015). Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers. International Journal of Speech Technology, 18(2), 195–203.CrossRefGoogle Scholar
  2. Bäckström, T. (2017). Speech coding with code-excited linear prediction: Signals and communication technology (1st ed.). New York: Springer. eBook ISBN 978-3-319-50204-5.Google Scholar
  3. Beritelli, F., Casale, S., & Russo, M. (1999). A pattern recognition approach to robust voiced/unvoiced speech classification using fuzzy logic. International Journal of Pattern Recognition and Artificial Intelligence, 13(01), 109–132.CrossRefGoogle Scholar
  4. Borin, R. G., & Silva, M. T. (2017). Voice activity detection using discriminative restricted Boltzmann machines. In EUSIPCO-2017 (pp. 523–527).Google Scholar
  5. Bouguelia, M. R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics.  https://doi.org/10.1007/s13042-017-0645-0.Google Scholar
  6. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.MathSciNetzbMATHGoogle Scholar
  7. Dey, M., Dey, N., Mahata, S. K., Chakraborty, S., Acharjee, S., & Das, A. (2014). Electrocardiogram feature based inter-human biometric authentication system. In ICESC-2014 (pp. 300–304).Google Scholar
  8. Dharavath, K., Talukdar, F. A., Laskar, R. H., & Dey, N. (2017). Face recognition under dry and wet face conditions. In N. Dey & V. Santhi (Eds.), Intelligent techniques in signal processing for multimedia security (pp. 253–271). Cham: Springer.CrossRefGoogle Scholar
  9. Ding, S., Zhang, N., Zhang, J., Xu, X., & Shi, Z. (2017). Unsupervised extreme learning machine with representational features. International Journal of Machine Learning and Cybernetics, 8(2), 587–595.CrossRefGoogle Scholar
  10. Dudley, H. (1939). The vocoder. Bell Labs Record, 17, 122–126.Google Scholar
  11. Dudley, H., Riesz, R. R., & Watkins, S. A. (1939). A synthetic speaker. Journal of Franklin Institute, 227, 739–764.CrossRefGoogle Scholar
  12. Freeman, D. K., Cosier, G., Southcott, C. B., & Boyd, I. (1989). The voice activity detector for the Pan-European digital cellular mobile telephone service. In ICASSP-1989, (pp. 369–372).Google Scholar
  13. Ghosh, P. K., Tsiartas, A., & Narayanan, S. (2011). Robust voice activity detection using long-term signal variability. IEEE Transactions on Audio, Speech, and Language Processing, 19(3), 600–613.CrossRefGoogle Scholar
  14. Gil-Pita, R., Garca-Gomez, J., Bautista-Durn, M., Combarro, E., & Cocana-Fernandez, A. (2017). Evolved frequency log-energy coefficients for voice activity detection in hearing aids. In FUZZ-IEEE-2017 (pp. 1–6).Google Scholar
  15. Gorriz, J. M., Ramrez, J., Lang, E. W., & Puntonet, C. G. (2006). Hard c-means clustering for voice activity detection. Speech Communication, 48(12), 1638–1649.CrossRefGoogle Scholar
  16. Graf, S., Herbig, T., Buck, M., & Schmidt, G. (2015). Features for voice activity detection: A comparative analysis. EURASIP Journal on Advances in Signal Processing, 2015(1), 91.CrossRefGoogle Scholar
  17. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.CrossRefGoogle Scholar
  18. Hamaidi, L. K., Muma, M., & Zoubir, A. M. (2017). Robust distributed multi-speaker voice activity detection using stability selection for sparse non-negative feature extraction. In EUSIPCO-2017 (pp. 161–165).Google Scholar
  19. Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1), 51–83.CrossRefGoogle Scholar
  20. Hu, K., Zhou, Z., Weng, L., Liu, J., Wang, L., Su, Y., et al. (2017). An optimization strategy for weighted extreme learning machine based on PSO. International Journal of Pattern Recognition and Artificial Intelligence, 31(01), 1751001.CrossRefGoogle Scholar
  21. Huang, G. B., Bai, Z., Kasun, L. L. C., & Vong, C. M. (2015). Local receptive fields based extreme learning machine. IEEE Computational Intelligence Magazine, 10(2), 18–29.CrossRefGoogle Scholar
  22. Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501.CrossRefGoogle Scholar
  23. Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access, 5, 25542–25554.CrossRefGoogle Scholar
  24. Joseph, S. M., & Babu, A. P. (2016). Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding. International Journal of Speech Technology, 19(3), 537–550.CrossRefGoogle Scholar
  25. Luo, Y., Yang, B., Xu, L., Hao, L., Liu, J., Yao, Y., et al. (2017). Segmentation of the left ventricle in cardiac MRI using a hierarchical extreme learning machine model. International Journal of Machine Learning and Cybernetics.  https://doi.org/10.1007/s13042-017-0678-4.Google Scholar
  26. Lyon, D. A. (2009). The discrete Fourier transform, part 4: Spectral leakage. Journal of object technology.  https://doi.org/10.5381/jot.2009.8.7.c2.Google Scholar
  27. Ma, Y., & Nishihara, A. (2013). Efficient voice activity detection algorithm using long-term spectral flatness measure. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 87.CrossRefGoogle Scholar
  28. Mukherjee, H., Obaidullah, S. M., Phadikar, S., & Roy, K. (2018). MISNA-A musical instrument segregation system from noisy audio with LPCC-S features and extreme learning. Multimedia Tools and Applications.  https://doi.org/10.1007/s11042-018-5993-6.Google Scholar
  29. Obaidullah, S. M., Santosh, K. C., Das, N., Halder, C., & Roy, K. (2018). Handwritten Indic script identification in multi-script document images: A survey. International Journal of Pattern Recognition and Artificial Intelligence.  https://doi.org/10.1142/S0218001418560128.Google Scholar
  30. Odelowo, B. O., & Anderson, D. V. (2017). Speech enhancement using extreme learning machines. In WASPAA-2017 (pp. 200–204).Google Scholar
  31. Paliwal, K. K. (1992). On the use of line spectral frequency parameters for speech recognition. Digital Signal Processing, 2(2), 80–87.CrossRefGoogle Scholar
  32. Pasad, A., Sabu, K., & Rao, P. (2017). Voice activity detection for children’s read speech recognition in noisy conditions. In NCC-2017 (pp. 1–6).Google Scholar
  33. Rajeswari, P., Raju, S. V., Ashour, A. S., & Dey, N. (2017). Multi-fingerprint unimodelbased biometric authentication supporting cloud computing. In N. Dey & V. Santhi (Eds.), Intelligent techniques in signal processing for multimedia security (pp. 469–485). Cham: Springer.Google Scholar
  34. Shi, Y. Q., Li, R. W., Zhang, S., Wang, S., & Yi, X. Q. (2016). A speech endpoint detection algorithm based on BP neural network and multiple features. In AMMIS-2015 (pp. 393–402).Google Scholar
  35. Solé-Casals, J., Martí-Puig, P., Reig-Bolaño, R., & Zaiats, V. (2009). Score function for voice activity detection. In NOLISP-09 (pp. 76–83).Google Scholar
  36. Vajda, S., & Santosh, K. C. (2016). A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering. In RTIP2R-2016 (pp. 185–193).Google Scholar
  37. Wang, L., Phapatanaburi, K., Go, Z., Nakagawa, S., Iwahashi, M., & Dang, J. (2017). Phase aware deep neural network for noise robust voice activity detection. In ICME-17 (pp. 1087–1092).Google Scholar
  38. Wei, H., Long, Y., & Mao, H. (2016). Improvements on self-adaptive voice activity detector for telephone data. International Journal of Speech Technology, 19(3), 623–630.CrossRefGoogle Scholar
  39. Wu, B., Ren, X., Liu, C., & Zhang, Y. (1997). A robust, real-time voice activity detection algorithm for embedded mobile devices. Journal of Sol-Gel Science and Technology, 8(2), 133–146.CrossRefGoogle Scholar
  40. Wu, G. D., & Wu, P. J. (2016). Type-2 fuzzy neural network for voice activity detection. In iFuzzy-2016 (pp. 1–4).Google Scholar
  41. Wu, J., & Zhang, X. L. (2011). An efficient voice activity detection algorithm by combining statistical model and energy detection. EURASIP Journal on Advances in Signal Processing, 2011(1), 18.CrossRefGoogle Scholar
  42. Yoo, I. C., Lim, H., & Yook, D. (2015). Formant-based robust voice activity detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, 23(12), 2238–2245.CrossRefGoogle Scholar
  43. Zhao, H., Guo, X., Wang, M., Li, T., Pang, C., & Georgakopoulos, D. (2018). Analyze EEG signals with extreme learning machine based on PMIS feature selection. International Journal of Machine Learning and Cybernetics, 9(2), 243–249.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceWest Bengal State UniversityKolkataIndia
  2. 2.Department of Computer Science and EngineeringAliah UniversityKolkataIndia
  3. 3.Department of Computer ScienceThe University of South Dakota VermillionUSA
  4. 4.Department of Computer Science and EngineeringMaulana Abul Kalam Azad University of TechnologyKolkataIndia

Personalised recommendations