Stage Audio Classifier Using Artificial Neural Network

  • M. S. Arun SankarEmail author
  • Tharak Sai Bobba
  • P. S. Sathi Devi
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 637)


Perceptual quality of audio signals at the receiver and transmission data rate are the major concerns for the speech codec developers. But both these parameters are inversely proportional in general. In the era of 4G, 3GPP launched Enhanced Voice Services (EVS) codec which can operate in multiple data rates with a six-stage speech classifier using threshold-based GMM statistical model. In this work, we propose a seven-stage audio classifier for voiced, unvoiced, transition, multi-speaker, silence, background noise and music signals using neural network by employing Levenberg Marquardt (LM) algorithm. In comparison with conventional statistical approach that requires determination of manual thresholds, the neural network method can simplify the categorization process especially while using a large number of parameters. The categorization is done by using extracted seven features that constitute to a 32-dimensional vector. TIMIT and NOIZEUS databases are used to generate the dataset and a classification accuracy of 94% is obtained. As the network model can perform efficiently using lesser number of neurons, the complexity is also less.


Speech coding LPC Neural network Speech classifier CELP Speech 


  1. 1.
    Atal, B.S.: The history of linear prediction. IEEE Signal Process. Mag. 23(2), 154–161 (2006)CrossRefGoogle Scholar
  2. 2.
    Spanias, A.S.: Speech coding: a tutorial review. Proc. IEEE 82(10), 1541–1582 (1994)CrossRefGoogle Scholar
  3. 3.
    Chu, W.C.: Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. Wiley (2004)Google Scholar
  4. 4.
    Recommendation 3GPP TS 26.441 Codec for Enhanced Voice Services (EVS): General Overview, 3GPP, Sept 2014Google Scholar
  5. 5.
    Li, Z., Xie, Z., Wang, J., Grancharov, V., Liu, W.: Optimization of EVS speech/music classifier based on deep learning. In: 14th IEEE International Conference on Signal Processing (ICSP), pp. 260–264 (2018)Google Scholar
  6. 6.
    Ghiselli-Crippa, T., El-Jaroudi, A.: Voiced-unvoiced-silence classification of speech using neural nets. In: IJCNN-91-Seattle International Joint Conference on Neural Networks, vol. 2, pp. 851–856 (1991)Google Scholar
  7. 7.
    Huiqun, D., O’Shaughnessy, D.: Voiced-unvoiced-silence speech sound classification based on unsupervised learning. In: 2007 IEEE International Conference on Multimedia and Expo, IEEE (2007)Google Scholar
  8. 8.
    Basterrech, S., Mohammed, S., Rubino, G., Soliman, M.: Levenberg—Marquardt training algorithms for random neural networks. Comput. J. 54(1), 125–135 (2011)CrossRefGoogle Scholar
  9. 9.
    Atal, B., Rabiner, L.: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Trans. Acoust. Speech, Signal Process. ASSP-24, 201–212 (1976)CrossRefGoogle Scholar
  10. 10.
    Qi, Y., Hunt, B.R.: Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Trans. Speech Audio Process. 1(2), 250–255 (1993)CrossRefGoogle Scholar
  11. 11.
    ITU-T: G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP) (2007)Google Scholar
  12. 12.
    Recommendation, I. T. U. T. G.: 720.1: Generic sound activity detector, ITU-T (2010)Google Scholar
  13. 13.
    Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Linguistic Data Consortium, Philadelphia (1993)Google Scholar
  14. 14.
    Hu, Y., Loizou, P.: Subjective evaluation and comparison of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)Google Scholar
  15. 15.
    Boersma, P., Weenink, D.: Praat: Doing Phonetics by Computer, Version 6.0.40. (2018)Google Scholar
  16. 16.
    Cannam, C., Landone, C., Sandler, M.: An open source application for viewing, analysing, and annotating music audio files. In: Proceedings of the ACM Multimedia 2010 International Conference, Firenze, Italy, Oct 2010, pp. 1467–1468 (2010)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • M. S. Arun Sankar
    • 1
    Email author
  • Tharak Sai Bobba
    • 1
  • P. S. Sathi Devi
    • 1
  1. 1.Department of Electronics and Communication EngineeringNational Institute of Technology CalicutCalicutIndia

Personalised recommendations