Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Al-Hattab, Yousef Abd; Zaki, Hasan Firdaus; Shafie, Amir Akramin

doi:10.1007/s00521-021-06091-7

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Original Article
Published: 26 May 2021

Volume 33, pages 14495–14506, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yousef Abd Al-Hattab¹,
Hasan Firdaus Zaki ORCID: orcid.org/0000-0002-7209-4355¹ &
Amir Akramin Shafie¹

728 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

The classification of environmental sounds is important for emerging applications such as automatic audio surveillance, audio forensics, and robot navigation. Existing techniques combined multiple features and stacked many CNN layers (very deep learning) to reach the desired accuracy. Instead of using many features and going deeper by stacking layers that are resource extensive, this paper proposes a novel technique that uses only a single feature, namely the Mel-Frequency Cepstral Coefficient (MFCC) and just three layers of CNN. We demonstrate that such a simple network can considerably outperform several conventional and deep learning-based algorithms. Through parameters fine-tuning of the data input, we reported a model that is significantly less complex in the architecture yet has recorded a similar accuracy of 95.59% compared to state-of-the-art deep models on UrbanSound8k dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Ademola E. Ilesanmi & Taiwo O. Ilesanmi

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Asifullah Khan, Anabia Sohail, … Aqsa Saeed Qureshi

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Anamika Dhillon & Gyanendra K. Verma

References

Ali H, Tran SN, Benetos E, Garcez ASDA (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19
Article Google Scholar
Ghosal, D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. In: Interspeech, pp 2087–2091
Chachada S, Kuo CCJ (2014) Environmental sound recognition: a survey. APSIPA Trans Signal Inf Process 3
Zhang Z, Xu S, Cao S, Zhang S (2018) Deep convolutional neural network with mixup for environmental sound classification. In: Chinese conference on pattern recognition and computer vision (prcv), Springer, Cham, pp 356–367
Shkurti F, Chang WD et al (2017) Underwater multi-robot convoying using visual tracking by detection. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4189–4196. IEEE
Chu S, Narayanan S, Kuo CCJ (2009) Environmental sound recognition with time–frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158
Article Google Scholar
Giannoulis D, Benetos E, Stowell D, Rossignol M, Lagrange M, Plumbley MD (2013) Detection and classification of acoustic scenes and events: an IEEE AASP challenge. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics, pp 1–4. IEEE
Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 559–563. IEEE
LeCun Y, Bengio Y, Hinton G (2015). Deep Learn Nat 521(7553):436–444
Palaz D, Collobert R (2015) Analysis of cnn-based speech recognition system using raw speech as input (No. REP_WORK). Idiap
Adavanne, S., & Virtanen, T. (2017). Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. arXiv preprint axXiv:1701.02998
Adavanne S, Pertilä P, Virtanen T (2017) Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 771–775. IEEE
Zaki HF, Shafait F, Mian A (2016) Modeling 2D appearance evolution for 3D object categorization. In: 2016 international conference on digital image computing: techniques and applications (DICTA), pp 1–8. IEEE
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE
Meyer, M., Cavigelli, L., & Thiele, L. (2017). Efficient convolutional neural network for audio event detection. arXiv preprint axXiv:1709.09888
Pons J, Serra X (2019) Randomly weighted cnns for (music) audio classification. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 336–340. IEEE
Dai W, Dai C, Qu S, Li J, Das S (2017) Very deep convolutional neural networks for raw waveforms. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 421–425. IEEE
Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283
Article Google Scholar
Boddapati V, Petef A, Rasmusson J, Lundberg L (2017) Classifying environmental sounds using image recognition networks. Proc Comput Sci 112:2048–2056
Article Google Scholar
Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263
Article Google Scholar
Su Y, Zhang K, Wang J, Madani K (2019) Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 19(7):1733
Article Google Scholar
Sharma, J., Granmo, O. C., & Goodwin, M. (2019). Environment sound classification using multiple feature channels and attention based deep convolutional neural network. arXiv preprint axXiv:1908.11219
Virtanen T, Plumbley MD, Ellis D (eds) (2018) Computational analysis of sound scenes and events. Springer, Heidelberg, pp 3–12
Book Google Scholar
Sahidullah M, Saha G (2012) Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun 54(4):543–565
Article Google Scholar
Shepard RN (1964) Circularity in judgments of relative pitch. J Acoust Soc Am 36(12):2346–2353
Article Google Scholar
Paulus J, Müller M, Klapuri A (2010) State of the art report: audio-based music structure analysis. In: Ismir, pp 625–636
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 315–323
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint axXiv:1207.0580
Jung SH, Chung YJ (2020) Performance analysis of the convolutional recurrent neural network on acoustic event detection. Bull Electr Eng and Info 9(4):1387–1393
MathSciNet Google Scholar
Lezhenin I, Bogach N, Pyshkin E (2019) Urban sound classification using long short-term memory neural network. In: 2019 federated conference on computer science and information systems (FedCSIS), pp 57–60. IEEE
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 1041–1044
Tokozume, Y, Harada T (2017) Learning environmental sounds with end-to-end convolutional neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2721–2725). IEEE

Download references

Author information

Authors and Affiliations

Department of Mechatronics Engineering, International Islamic University Malaysia, 53100, Gombak, Selangor, Malaysia
Yousef Abd Al-Hattab, Hasan Firdaus Zaki & Amir Akramin Shafie

Authors

Yousef Abd Al-Hattab
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Firdaus Zaki
View author publications
You can also search for this author in PubMed Google Scholar
Amir Akramin Shafie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yousef Abd Al-Hattab.

Ethics declarations

Conflict of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Hattab, Y.A., Zaki, H.F. & Shafie, A.A. Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction. Neural Comput & Applic 33, 14495–14506 (2021). https://doi.org/10.1007/s00521-021-06091-7

Download citation

Received: 25 July 2020
Accepted: 28 April 2021
Published: 26 May 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s00521-021-06091-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

A survey of the recent architectures of deep convolutional neural networks

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

A survey of the recent architectures of deep convolutional neural networks

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation