Skip to main content

A Comparative Study on Approaches to Acoustic Scene Classification Using CNNs

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2021)

Abstract

Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such representations on classification accuracy using neural networks. We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders. Our dataset consists of sounds from three settings of indoors and outdoors environments – thus, the dataset contains sounds from six different kinds of environments. We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy. We reported our findings, insights, and some guidelines to achieve better accuracy for environment classification using sounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Temko, A., Nadeu, C., Macho, D., Malkin, R., Zieger, C., Omologo, M.: Acoustic event detection and classification. In: Waibel, A., Stiefelhagen, R. (eds.) Computers in the Human Interaction Loop. Human–Computer Interaction, pp. 61–73. Springer, London (2009)

    Chapter  Google Scholar 

  2. Liu, Z., Wang, Y., Chen, T.: Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Sig. Proc. Syst. Signal Image Video Technol. 20(1), 61–79 (1998)

    Article  Google Scholar 

  3. Giannoulis, D., Stowell, D., Benetos, E., Rossignol, M., Lagrange, M., Plumbley, M.D.: A database and challenge for acoustic scene classification and event detection. In: 21st European Signal Processing Conference (EUSIPCO 2013) (2013)

    Google Scholar 

  4. Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)

    Article  Google Scholar 

  5. Valenti, M., Squartini, S., Diment, A., Parascandolo, G., Virtanen, T.: A convolutional neural network approach for acoustic scene classification. In: 2017 International Joint Conference on Neural Networks (IJCNN) (May 2017), pp. 1547–1554. ISSN: 2161-4407. (2017)

    Google Scholar 

  6. Hussain, K., Hussain, M., Khan, M.G.: An improved acoustic scene classification method using convolutional neural networks (CNNs). Am. Sci. Res. J. Eng. Technol. Sci. 44(1), 68–76 (2018)

    Google Scholar 

  7. Abeßer, J.: A review of deep learning based methods for acoustic scene classification. Appl. Sci. 10(6), 2020 (2020)

    Article  Google Scholar 

  8. Heittola, T., Mesaros, A., Virtanen, T.: Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions. arXiv:2005.14623 [eess] (2020)

  9. Heittola, T., Mesaros, A., Virtanen, T.: TAU Urban acoustic scenes 2020 mobile, development dataset. Zenodo (2020). https://doi.org/10.5281/zenodo.3819968

    Article  Google Scholar 

  10. Felipe, G.Z., Maldonado, Y., Costa, G.D., Helal, L.G.: Acoustic scene classification using spectrograms. In: 2017 36th International Conference of the Chilean Computer Science Society (SCCC), pp. 1–7 (2017)

    Google Scholar 

  11. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  12. McFee, B., et al.: librosa: Audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015)

    Google Scholar 

  13. Gemmeke, J., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

    Google Scholar 

  14. Hershey, S., et al.: CNN Architectures for Large-Scale Audio Classification. arXiv:1609.09430 [cs, stat] (2017)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (2015)

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  17. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)

    Google Scholar 

  18. Zhang, T., Liang, J., Ding, B.: Acoustic scene classification using deep CNN with fine-resolution feature. Expert Syst. Appl. 143, 113067 (2020). https://doi.org/10.1016/j.eswa.2019.113067

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shadab Hafiz Choudhury .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ananya, I.J., Suad, S., Choudhury, S.H., Khan, M.A. (2021). A Comparative Study on Approaches to Acoustic Scene Classification Using CNNs. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Computational Intelligence. MICAI 2021. Lecture Notes in Computer Science(), vol 13067. Springer, Cham. https://doi.org/10.1007/978-3-030-89817-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89817-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89816-8

  • Online ISBN: 978-3-030-89817-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics