Skip to main content
Log in

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The classification of environmental sounds is important for emerging applications such as automatic audio surveillance, audio forensics, and robot navigation. Existing techniques combined multiple features and stacked many CNN layers (very deep learning) to reach the desired accuracy. Instead of using many features and going deeper by stacking layers that are resource extensive, this paper proposes a novel technique that uses only a single feature, namely the Mel-Frequency Cepstral Coefficient (MFCC) and just three layers of CNN. We demonstrate that such a simple network can considerably outperform several conventional and deep learning-based algorithms. Through parameters fine-tuning of the data input, we reported a model that is significantly less complex in the architecture yet has recorded a similar accuracy of 95.59% compared to state-of-the-art deep models on UrbanSound8k dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ali H, Tran SN, Benetos E, Garcez ASDA (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19

    Article  Google Scholar 

  2. Ghosal, D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. In: Interspeech, pp 2087–2091

  3. Chachada S, Kuo CCJ (2014) Environmental sound recognition: a survey. APSIPA Trans Signal Inf Process 3

  4. Zhang Z, Xu S, Cao S, Zhang S (2018) Deep convolutional neural network with mixup for environmental sound classification. In: Chinese conference on pattern recognition and computer vision (prcv), Springer, Cham, pp 356–367

  5. Shkurti F, Chang WD et al (2017) Underwater multi-robot convoying using visual tracking by detection. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4189–4196. IEEE

  6. Chu S, Narayanan S, Kuo CCJ (2009) Environmental sound recognition with time–frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158

    Article  Google Scholar 

  7. Giannoulis D, Benetos E, Stowell D, Rossignol M, Lagrange M, Plumbley MD (2013) Detection and classification of acoustic scenes and events: an IEEE AASP challenge. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics, pp 1–4. IEEE

  8. Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 559–563. IEEE

  9. LeCun Y, Bengio Y, Hinton G (2015). Deep Learn Nat 521(7553):436–444

  10. Palaz D, Collobert R (2015) Analysis of cnn-based speech recognition system using raw speech as input (No. REP_WORK). Idiap

  11. Adavanne, S., & Virtanen, T. (2017). Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. arXiv preprint axXiv:1701.02998

  12. Adavanne S, Pertilä P, Virtanen T (2017) Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 771–775. IEEE

  13. Zaki HF, Shafait F, Mian A (2016) Modeling 2D appearance evolution for 3D object categorization. In: 2016 international conference on digital image computing: techniques and applications (DICTA), pp 1–8. IEEE

  14. Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE

  15. Meyer, M., Cavigelli, L., & Thiele, L. (2017). Efficient convolutional neural network for audio event detection. arXiv preprint axXiv:1709.09888

  16. Pons J, Serra X (2019) Randomly weighted cnns for (music) audio classification. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 336–340. IEEE

  17. Dai W, Dai C, Qu S, Li J, Das S (2017) Very deep convolutional neural networks for raw waveforms. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 421–425. IEEE

  18. Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283

    Article  Google Scholar 

  19. Boddapati V, Petef A, Rasmusson J, Lundberg L (2017) Classifying environmental sounds using image recognition networks. Proc Comput Sci 112:2048–2056

    Article  Google Scholar 

  20. Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263

    Article  Google Scholar 

  21. Su Y, Zhang K, Wang J, Madani K (2019) Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7):1733

    Article  Google Scholar 

  22. Sharma, J., Granmo, O. C., & Goodwin, M. (2019). Environment sound classification using multiple feature channels and attention based deep convolutional neural network. arXiv preprint axXiv:1908.11219

  23. Virtanen T, Plumbley MD, Ellis D (eds) (2018) Computational analysis of sound scenes and events. Springer, Heidelberg, pp 3–12

    Book  Google Scholar 

  24. Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun 54(4):543–565

    Article  Google Scholar 

  25. Shepard RN (1964) Circularity in judgments of relative pitch. J Acoust Soc Am 36(12):2346–2353

    Article  Google Scholar 

  26. Paulus J, Müller M, Klapuri A (2010) State of the art report: audio-based music structure analysis. In: Ismir, pp 625–636

  27. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 315–323

  28. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint axXiv:1207.0580

  29. Jung SH, Chung YJ (2020) Performance analysis of the convolutional recurrent neural network on acoustic event detection. Bull Electr Eng and Info 9(4):1387–1393

    MathSciNet  Google Scholar 

  30. Lezhenin I, Bogach N, Pyshkin E (2019) Urban sound classification using long short-term memory neural network. In: 2019 federated conference on computer science and information systems (FedCSIS), pp 57–60. IEEE

  31. Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 1041–1044

  32. Tokozume, Y, Harada T (2017) Learning environmental sounds with end-to-end convolutional neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2721–2725). IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousef Abd Al-Hattab.

Ethics declarations

Conflict of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Hattab, Y.A., Zaki, H.F. & Shafie, A.A. Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction. Neural Comput & Applic 33, 14495–14506 (2021). https://doi.org/10.1007/s00521-021-06091-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06091-7

Keywords

Navigation