Abstract
Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such representations on classification accuracy using neural networks. We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders. Our dataset consists of sounds from three settings of indoors and outdoors environments – thus, the dataset contains sounds from six different kinds of environments. We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy. We reported our findings, insights, and some guidelines to achieve better accuracy for environment classification using sounds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Temko, A., Nadeu, C., Macho, D., Malkin, R., Zieger, C., Omologo, M.: Acoustic event detection and classification. In: Waibel, A., Stiefelhagen, R. (eds.) Computers in the Human Interaction Loop. Human–Computer Interaction, pp. 61–73. Springer, London (2009)
Liu, Z., Wang, Y., Chen, T.: Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Sig. Proc. Syst. Signal Image Video Technol. 20(1), 61–79 (1998)
Giannoulis, D., Stowell, D., Benetos, E., Rossignol, M., Lagrange, M., Plumbley, M.D.: A database and challenge for acoustic scene classification and event detection. In: 21st European Signal Processing Conference (EUSIPCO 2013) (2013)
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)
Valenti, M., Squartini, S., Diment, A., Parascandolo, G., Virtanen, T.: A convolutional neural network approach for acoustic scene classification. In: 2017 International Joint Conference on Neural Networks (IJCNN) (May 2017), pp. 1547–1554. ISSN: 2161-4407. (2017)
Hussain, K., Hussain, M., Khan, M.G.: An improved acoustic scene classification method using convolutional neural networks (CNNs). Am. Sci. Res. J. Eng. Technol. Sci. 44(1), 68–76 (2018)
Abeßer, J.: A review of deep learning based methods for acoustic scene classification. Appl. Sci. 10(6), 2020 (2020)
Heittola, T., Mesaros, A., Virtanen, T.: Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions. arXiv:2005.14623 [eess] (2020)
Heittola, T., Mesaros, A., Virtanen, T.: TAU Urban acoustic scenes 2020 mobile, development dataset. Zenodo (2020). https://doi.org/10.5281/zenodo.3819968
Felipe, G.Z., Maldonado, Y., Costa, G.D., Helal, L.G.: Acoustic scene classification using spectrograms. In: 2017 36th International Conference of the Chilean Computer Science Society (SCCC), pp. 1–7 (2017)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
McFee, B., et al.: librosa: Audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015)
Gemmeke, J., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Hershey, S., et al.: CNN Architectures for Large-Scale Audio Classification. arXiv:1609.09430 [cs, stat] (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Zhang, T., Liang, J., Ding, B.: Acoustic scene classification using deep CNN with fine-resolution feature. Expert Syst. Appl. 143, 113067 (2020). https://doi.org/10.1016/j.eswa.2019.113067
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ananya, I.J., Suad, S., Choudhury, S.H., Khan, M.A. (2021). A Comparative Study on Approaches to Acoustic Scene Classification Using CNNs. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-89817-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)