A Comparative Study on Approaches to Acoustic Scene Classification Using CNNs

Ananya, Ishrat Jahan; Suad, Sarah; Choudhury, Shadab Hafiz; Khan, Mohammad Ashrafuzzaman

doi:10.1007/978-3-030-89817-5_6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13067))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1080 Accesses
1 Citations
3 Altmetric

Abstract

Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such representations on classification accuracy using neural networks. We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders. Our dataset consists of sounds from three settings of indoors and outdoors environments – thus, the dataset contains sounds from six different kinds of environments. We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy. We reported our findings, insights, and some guidelines to achieve better accuracy for environment classification using sounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Temko, A., Nadeu, C., Macho, D., Malkin, R., Zieger, C., Omologo, M.: Acoustic event detection and classification. In: Waibel, A., Stiefelhagen, R. (eds.) Computers in the Human Interaction Loop. Human–Computer Interaction, pp. 61–73. Springer, London (2009)
Chapter Google Scholar
Liu, Z., Wang, Y., Chen, T.: Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Sig. Proc. Syst. Signal Image Video Technol. 20(1), 61–79 (1998)
Article Google Scholar
Giannoulis, D., Stowell, D., Benetos, E., Rossignol, M., Lagrange, M., Plumbley, M.D.: A database and challenge for acoustic scene classification and event detection. In: 21st European Signal Processing Conference (EUSIPCO 2013) (2013)
Google Scholar
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)
Article Google Scholar
Valenti, M., Squartini, S., Diment, A., Parascandolo, G., Virtanen, T.: A convolutional neural network approach for acoustic scene classification. In: 2017 International Joint Conference on Neural Networks (IJCNN) (May 2017), pp. 1547–1554. ISSN: 2161-4407. (2017)
Google Scholar
Hussain, K., Hussain, M., Khan, M.G.: An improved acoustic scene classification method using convolutional neural networks (CNNs). Am. Sci. Res. J. Eng. Technol. Sci. 44(1), 68–76 (2018)
Google Scholar
Abeßer, J.: A review of deep learning based methods for acoustic scene classification. Appl. Sci. 10(6), 2020 (2020)
Article Google Scholar
Heittola, T., Mesaros, A., Virtanen, T.: Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions. arXiv:2005.14623 [eess] (2020)
Heittola, T., Mesaros, A., Virtanen, T.: TAU Urban acoustic scenes 2020 mobile, development dataset. Zenodo (2020). https://doi.org/10.5281/zenodo.3819968
Article Google Scholar
Felipe, G.Z., Maldonado, Y., Costa, G.D., Helal, L.G.: Acoustic scene classification using spectrograms. In: 2017 36th International Conference of the Chilean Computer Science Society (SCCC), pp. 1–7 (2017)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
McFee, B., et al.: librosa: Audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015)
Google Scholar
Gemmeke, J., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Hershey, S., et al.: CNN Architectures for Large-Scale Audio Classification. arXiv:1609.09430 [cs, stat] (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar
Zhang, T., Liang, J., Ding, B.: Acoustic scene classification using deep CNN with fine-resolution feature. Expert Syst. Appl. 143, 113067 (2020). https://doi.org/10.1016/j.eswa.2019.113067
Article Google Scholar

Download references

Author information

Authors and Affiliations

North South University, Dhaka, Bangladesh
Ishrat Jahan Ananya, Sarah Suad, Shadab Hafiz Choudhury & Mohammad Ashrafuzzaman Khan

Authors

Ishrat Jahan Ananya
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Suad
View author publications
You can also search for this author in PubMed Google Scholar
Shadab Hafiz Choudhury
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ashrafuzzaman Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shadab Hafiz Choudhury .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Ildar Batyrshin
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Alexander Gelbukh
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ananya, I.J., Suad, S., Choudhury, S.H., Khan, M.A. (2021). A Comparative Study on Approaches to Acoustic Scene Classification Using CNNs. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-89817-5_6
Published: 21 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89816-8
Online ISBN: 978-3-030-89817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics