Skip to main content

Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 635))

Abstract

This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test \(p<\)0.01 and \(p<\)0.05 respectively).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. F. Eyben, F. Weninger, F. Gross, and B. Schuller. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proc. ACM MM, pages 835–838, Barcelona, Catalunya, Spain, 2013

    Google Scholar 

  2. K. Qian, C. Janott, Z. Zhang, C. Heiser, and B. Schuller. Wavelet features for classification of vote snore sounds. In Proc. ICASSP, pages 221–225, Shanghai, China, 2016

    Google Scholar 

  3. K. Qian, C. Janott, V. Pandit, Z. Zhang, C. Heiser, W. Hohenhorst, M. Herzog, W. Hemmert, and B. Schuller. Classification of the excitation location of snore sounds in the upper airway by acoustic multifeature analysis. IEEE Transactions on Biomedical Engineering, 64(8):1731–1741, 2017

    Article  Google Scholar 

  4. Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, and Björn Schuller. Automatic classification of autistic child vocalisations: A novel database and results. In Proc. of INTERSPEECH, pages 849–853, Stockholm, Sweden, 2017

    Google Scholar 

  5. A. Mesaros, T. Heittola, and T. Virtanen. TUT database for acoustic scene classification and sound event detection. In Proc. EUSIPCO, pages 1128–1132, Budapest, Hungary, 2016

    Google Scholar 

  6. Marc C Green and Damian Murphy. Acoustic scene classification using spatial features. In Proc. DCASE Workshop, pages 42–45, Munich, Germany, 2017

    Google Scholar 

  7. D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley. Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3):16–34, 2015

    Article  Google Scholar 

  8. J. T. Geiger, B. Schuller, and G. Rigoll. Large-scale audio feature extraction and svm for acoustic scene classification. In Proc. WASPAA Workshop, pages 1–4, New Paltz, NY, USA, 2013

    Google Scholar 

  9. A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. Dcase 2017 challenge setup: tasks, datasets and baseline system. In Proc. DCASE Workshop, pages 85–92, Munich, Germany, 2017

    Google Scholar 

  10. Stephane Mallat. A wavelet tour of signal processing: the sparse way. Elsevier, Burlington, MA, USA, 2009

    Google Scholar 

  11. Haibo He and Janusz A Starzyk. A self-organizing learning array system for power quality classification based on wavelet transform. IEEE Transactions on Power Delivery, 21(1):286–295, 2006

    Article  Google Scholar 

  12. Andrew Keong Ng, Tong San Koh, Udantha Ranjith Abeyratne, and Kathiravelu Puvanendran. Investigation of obstructive sleep apnea using nonlinear mode interactions in nonstationary snore signals. Annals of Biomedical Engineering, 37(9):1796–1806, 2009

    Article  Google Scholar 

  13. David Li, Jason Tam, and Derek Toub. Auditory scene classification using machine learning techniques. 2013

    Google Scholar 

  14. K. Qian, Z. Ren, V. Pandit, Z. Yang, Z. Zhang, and B. Schuller. Wavelets revisited for the classification of acoustic scenes. In Proc. DCASE Workshop, pages 108–112, Munich, Germany, 2017

    Google Scholar 

  15. M. Valenti, S. Squartini, A. Diment, G. Parascandolo, and T. Virtanen. A convolutional neural network approach for acoustic scene classification. In Proc. IJCNN, pages 1547–1554, Anchorage, AK, USA, 2017

    Google Scholar 

  16. Z. Ren, V. Pandit, K. Qian, Z. Yang, Z. Zhang, and B. Schuller. Deep sequential image features for acoustic scene classification. In Proc. DCASE Workshop, pages 113–117, Munich, Germany, 2017

    Google Scholar 

  17. Seongkyu Mun, Sangwook Park, David K Han, and Hanseok Ko. Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. In Proc. DCASE Workshop, pages 93–102, Munich, Germany, 2017

    Google Scholar 

  18. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012

    Article  Google Scholar 

  19. D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proc. CVPR, pages 3642–3649, Providence, RI, USA, 2012

    Google Scholar 

  20. Jee-Weon Jung, Hee-Soo Heo, IL-Ho Yang, Sung-Hyun Yoon, Hye-Jin Shim, and Ha-Jin Yu. Dnn-based audio scene classification for dcase 2017: Dual input features, balancing cost, and stochastic data duplication. In Proc. DCASE Workshop, pages 59–63, Munich, Germany, 2017

    Google Scholar 

  21. Florian Eyben. Real-time speech and music classification by large audio feature space extraction. Springer, Switzerland, 2015

    Google Scholar 

  22. Ronald R Coifman, Yves Meyer, and Victor Wickerhauser. Wavelet analysis and signal processing. In Wavelets and their Applications, pages 153–178, Sudbury, 1992. MA: Jones and Barlett

    Google Scholar 

  23. Rami N. Khushaba. Application of Biosignal-driven Intelligent Systems for Multifunction Prosthesis Control. University of Technology Sydney, Sydney, Australia, 2010. Doctoral Thesis

    Google Scholar 

  24. Rami N Khushaba, Sarath Kodagoda, Sara Lal, and Gamini Dissanayake. Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm. IEEE Transactions on Biomedical Engineering, 58(1):121–131, 2011

    Article  Google Scholar 

  25. Gardner, M.W., Dorling, S.R.: Artificial neural networks (themultilayer perceptron)–a review of applications in the atmosphericsciences. Atmospheric Environment 32(14), 2627–2636 (1998)

    Article  Google Scholar 

  26. M. L. Seltzer, D. Yu, and Y. Wang. An investigation of deep neural networks for noise robust speech recognition. In Proc. ICASSP, pages 7398–7402, Vancouver, BC, Canada, 2013

    Google Scholar 

  27. S. K. Pal and S. Mitra. Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks, 3(5):683–697, 1992

    Article  Google Scholar 

  28. Tobias Scheffer, Christian Decomain, and Stefan Wrobel. Active hidden markov models for information extraction. In Proc. IDA, pages 309–318, Cascais, Portugal, 2001

    Google Scholar 

  29. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org

    Google Scholar 

  30. Murray R Spiegel, John J Schiller, R Alu Srinivasan, and Mike LeVan. Probability and Statistics. McGraw-Hill, New York, NY, USA, 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Qian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Z., Qian, K., Ren, Z., Baird, A., Zhang, Z., Schuller, B. (2020). Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks. In: Li, H., Li, S., Ma, L., Fang, C., Zhu, Y. (eds) Proceedings of the 7th Conference on Sound and Music Technology (CSMT). Lecture Notes in Electrical Engineering, vol 635. Springer, Singapore. https://doi.org/10.1007/978-981-15-2756-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2756-2_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2755-5

  • Online ISBN: 978-981-15-2756-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics