Abstract
The classification of environmental sounds is important for emerging applications such as automatic audio surveillance, audio forensics, and robot navigation. Existing techniques combined multiple features and stacked many CNN layers (very deep learning) to reach the desired accuracy. Instead of using many features and going deeper by stacking layers that are resource extensive, this paper proposes a novel technique that uses only a single feature, namely the Mel-Frequency Cepstral Coefficient (MFCC) and just three layers of CNN. We demonstrate that such a simple network can considerably outperform several conventional and deep learning-based algorithms. Through parameters fine-tuning of the data input, we reported a model that is significantly less complex in the architecture yet has recorded a similar accuracy of 95.59% compared to state-of-the-art deep models on UrbanSound8k dataset.
Similar content being viewed by others
References
Ali H, Tran SN, Benetos E, Garcez ASDA (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19
Ghosal, D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. In: Interspeech, pp 2087–2091
Chachada S, Kuo CCJ (2014) Environmental sound recognition: a survey. APSIPA Trans Signal Inf Process 3
Zhang Z, Xu S, Cao S, Zhang S (2018) Deep convolutional neural network with mixup for environmental sound classification. In: Chinese conference on pattern recognition and computer vision (prcv), Springer, Cham, pp 356–367
Shkurti F, Chang WD et al (2017) Underwater multi-robot convoying using visual tracking by detection. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4189–4196. IEEE
Chu S, Narayanan S, Kuo CCJ (2009) Environmental sound recognition with time–frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Giannoulis D, Benetos E, Stowell D, Rossignol M, Lagrange M, Plumbley MD (2013) Detection and classification of acoustic scenes and events: an IEEE AASP challenge. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics, pp 1–4. IEEE
Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 559–563. IEEE
LeCun Y, Bengio Y, Hinton G (2015). Deep Learn Nat 521(7553):436–444
Palaz D, Collobert R (2015) Analysis of cnn-based speech recognition system using raw speech as input (No. REP_WORK). Idiap
Adavanne, S., & Virtanen, T. (2017). Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. arXiv preprint axXiv:1701.02998
Adavanne S, Pertilä P, Virtanen T (2017) Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 771–775. IEEE
Zaki HF, Shafait F, Mian A (2016) Modeling 2D appearance evolution for 3D object categorization. In: 2016 international conference on digital image computing: techniques and applications (DICTA), pp 1–8. IEEE
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE
Meyer, M., Cavigelli, L., & Thiele, L. (2017). Efficient convolutional neural network for audio event detection. arXiv preprint axXiv:1709.09888
Pons J, Serra X (2019) Randomly weighted cnns for (music) audio classification. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 336–340. IEEE
Dai W, Dai C, Qu S, Li J, Das S (2017) Very deep convolutional neural networks for raw waveforms. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 421–425. IEEE
Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283
Boddapati V, Petef A, Rasmusson J, Lundberg L (2017) Classifying environmental sounds using image recognition networks. Proc Comput Sci 112:2048–2056
Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263
Su Y, Zhang K, Wang J, Madani K (2019) Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7):1733
Sharma, J., Granmo, O. C., & Goodwin, M. (2019). Environment sound classification using multiple feature channels and attention based deep convolutional neural network. arXiv preprint axXiv:1908.11219
Virtanen T, Plumbley MD, Ellis D (eds) (2018) Computational analysis of sound scenes and events. Springer, Heidelberg, pp 3–12
Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun 54(4):543–565
Shepard RN (1964) Circularity in judgments of relative pitch. J Acoust Soc Am 36(12):2346–2353
Paulus J, Müller M, Klapuri A (2010) State of the art report: audio-based music structure analysis. In: Ismir, pp 625–636
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 315–323
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint axXiv:1207.0580
Jung SH, Chung YJ (2020) Performance analysis of the convolutional recurrent neural network on acoustic event detection. Bull Electr Eng and Info 9(4):1387–1393
Lezhenin I, Bogach N, Pyshkin E (2019) Urban sound classification using long short-term memory neural network. In: 2019 federated conference on computer science and information systems (FedCSIS), pp 57–60. IEEE
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 1041–1044
Tokozume, Y, Harada T (2017) Learning environmental sounds with end-to-end convolutional neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2721–2725). IEEE
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have no conflicts of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Hattab, Y.A., Zaki, H.F. & Shafie, A.A. Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction. Neural Comput & Applic 33, 14495–14506 (2021). https://doi.org/10.1007/s00521-021-06091-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06091-7