Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks

Yang, Zijiang; Qian, Kun; Ren, Zhao; Baird, Alice; Zhang, Zixing; Schuller, Björn

doi:10.1007/978-981-15-2756-2_11

Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks

Zijiang Yang³⁹,
Kun Qian⁴⁰,
Zhao Ren³⁹,
Alice Baird³⁹,
Zixing Zhang⁴¹ &
…
Björn Schuller^39,41

Conference paper
First Online: 22 December 2019

428 Accesses
2 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 635))

Abstract

This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test \(p<\)0.01 and \(p<\)0.05 respectively).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

F. Eyben, F. Weninger, F. Gross, and B. Schuller. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proc. ACM MM, pages 835–838, Barcelona, Catalunya, Spain, 2013
Google Scholar
K. Qian, C. Janott, Z. Zhang, C. Heiser, and B. Schuller. Wavelet features for classification of vote snore sounds. In Proc. ICASSP, pages 221–225, Shanghai, China, 2016
Google Scholar
K. Qian, C. Janott, V. Pandit, Z. Zhang, C. Heiser, W. Hohenhorst, M. Herzog, W. Hemmert, and B. Schuller. Classification of the excitation location of snore sounds in the upper airway by acoustic multifeature analysis. IEEE Transactions on Biomedical Engineering, 64(8):1731–1741, 2017
Article Google Scholar
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, and Björn Schuller. Automatic classification of autistic child vocalisations: A novel database and results. In Proc. of INTERSPEECH, pages 849–853, Stockholm, Sweden, 2017
Google Scholar
A. Mesaros, T. Heittola, and T. Virtanen. TUT database for acoustic scene classification and sound event detection. In Proc. EUSIPCO, pages 1128–1132, Budapest, Hungary, 2016
Google Scholar
Marc C Green and Damian Murphy. Acoustic scene classification using spatial features. In Proc. DCASE Workshop, pages 42–45, Munich, Germany, 2017
Google Scholar
D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley. Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3):16–34, 2015
Article Google Scholar
J. T. Geiger, B. Schuller, and G. Rigoll. Large-scale audio feature extraction and svm for acoustic scene classification. In Proc. WASPAA Workshop, pages 1–4, New Paltz, NY, USA, 2013
Google Scholar
A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. Dcase 2017 challenge setup: tasks, datasets and baseline system. In Proc. DCASE Workshop, pages 85–92, Munich, Germany, 2017
Google Scholar
Stephane Mallat. A wavelet tour of signal processing: the sparse way. Elsevier, Burlington, MA, USA, 2009
Google Scholar
Haibo He and Janusz A Starzyk. A self-organizing learning array system for power quality classification based on wavelet transform. IEEE Transactions on Power Delivery, 21(1):286–295, 2006
Article Google Scholar
Andrew Keong Ng, Tong San Koh, Udantha Ranjith Abeyratne, and Kathiravelu Puvanendran. Investigation of obstructive sleep apnea using nonlinear mode interactions in nonstationary snore signals. Annals of Biomedical Engineering, 37(9):1796–1806, 2009
Article Google Scholar
David Li, Jason Tam, and Derek Toub. Auditory scene classification using machine learning techniques. 2013
Google Scholar
K. Qian, Z. Ren, V. Pandit, Z. Yang, Z. Zhang, and B. Schuller. Wavelets revisited for the classification of acoustic scenes. In Proc. DCASE Workshop, pages 108–112, Munich, Germany, 2017
Google Scholar
M. Valenti, S. Squartini, A. Diment, G. Parascandolo, and T. Virtanen. A convolutional neural network approach for acoustic scene classification. In Proc. IJCNN, pages 1547–1554, Anchorage, AK, USA, 2017
Google Scholar
Z. Ren, V. Pandit, K. Qian, Z. Yang, Z. Zhang, and B. Schuller. Deep sequential image features for acoustic scene classification. In Proc. DCASE Workshop, pages 113–117, Munich, Germany, 2017
Google Scholar
Seongkyu Mun, Sangwook Park, David K Han, and Hanseok Ko. Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. In Proc. DCASE Workshop, pages 93–102, Munich, Germany, 2017
Google Scholar
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012
Article Google Scholar
D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proc. CVPR, pages 3642–3649, Providence, RI, USA, 2012
Google Scholar
Jee-Weon Jung, Hee-Soo Heo, IL-Ho Yang, Sung-Hyun Yoon, Hye-Jin Shim, and Ha-Jin Yu. Dnn-based audio scene classification for dcase 2017: Dual input features, balancing cost, and stochastic data duplication. In Proc. DCASE Workshop, pages 59–63, Munich, Germany, 2017
Google Scholar
Florian Eyben. Real-time speech and music classification by large audio feature space extraction. Springer, Switzerland, 2015
Google Scholar
Ronald R Coifman, Yves Meyer, and Victor Wickerhauser. Wavelet analysis and signal processing. In Wavelets and their Applications, pages 153–178, Sudbury, 1992. MA: Jones and Barlett
Google Scholar
Rami N. Khushaba. Application of Biosignal-driven Intelligent Systems for Multifunction Prosthesis Control. University of Technology Sydney, Sydney, Australia, 2010. Doctoral Thesis
Google Scholar
Rami N Khushaba, Sarath Kodagoda, Sara Lal, and Gamini Dissanayake. Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm. IEEE Transactions on Biomedical Engineering, 58(1):121–131, 2011
Article Google Scholar
Gardner, M.W., Dorling, S.R.: Artificial neural networks (themultilayer perceptron)–a review of applications in the atmosphericsciences. Atmospheric Environment 32(14), 2627–2636 (1998)
Article Google Scholar
M. L. Seltzer, D. Yu, and Y. Wang. An investigation of deep neural networks for noise robust speech recognition. In Proc. ICASSP, pages 7398–7402, Vancouver, BC, Canada, 2013
Google Scholar
S. K. Pal and S. Mitra. Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks, 3(5):683–697, 1992
Article Google Scholar
Tobias Scheffer, Christian Decomain, and Stefan Wrobel. Active hidden markov models for information extraction. In Proc. IDA, pages 309–318, Cascais, Portugal, 2001
Google Scholar
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org
Google Scholar
Murray R Spiegel, John J Schiller, R Alu Srinivasan, and Mike LeVan. Probability and Statistics. McGraw-Hill, New York, NY, USA, 2009
Google Scholar

Download references

Author information

Authors and Affiliations

ZD.B Chair of Embedded Intelligence for Health Care & Wellbeing, Universität Augsburg, Augsburg, 86159, Germany
Zijiang Yang, Zhao Ren, Alice Baird & Björn Schuller
Educational Physiology Laboratory, Graduate School of Education, The University of Tokyo, Tokyo, 113-0033, Japan
Kun Qian
Group on Language, Audio & Music, Imperial College London, London, SW7 2AZ, United Kingdom
Zixing Zhang & Björn Schuller

Authors

Zijiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Ren
View author publications
You can also search for this author in PubMed Google Scholar
Alice Baird
View author publications
You can also search for this author in PubMed Google Scholar
Zixing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Qian .

Editor information

Editors and Affiliations

School of Computer Science and Tech., Harbin Institute of Technology (HIT), Harbin, Heilongjiang, China
Haifeng Li
Beijing Univ. of Posts and Telecom., Beijing, China
Shengchen Li
School of Computer Science and Tech., Harbin Institute of Technology (HIT), Harbin, Heilongjiang, China
Lin Ma
School of Computer and Information Eng., Heilongjiang University of science and technology, Harbin, Heilongjiang, China
Chunying Fang
The acoustical society of Beijing, Xicheng, Beijing, China
Yidan Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z., Qian, K., Ren, Z., Baird, A., Zhang, Z., Schuller, B. (2020). Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks. In: Li, H., Li, S., Ma, L., Fang, C., Zhu, Y. (eds) Proceedings of the 7th Conference on Sound and Music Technology (CSMT). Lecture Notes in Electrical Engineering, vol 635. Springer, Singapore. https://doi.org/10.1007/978-981-15-2756-2_11

Download citation

DOI: https://doi.org/10.1007/978-981-15-2756-2_11
Published: 22 December 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2755-5
Online ISBN: 978-981-15-2756-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics