Abstract
This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test \(p<\)0.01 and \(p<\)0.05 respectively).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
F. Eyben, F. Weninger, F. Gross, and B. Schuller. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proc. ACM MM, pages 835–838, Barcelona, Catalunya, Spain, 2013
K. Qian, C. Janott, Z. Zhang, C. Heiser, and B. Schuller. Wavelet features for classification of vote snore sounds. In Proc. ICASSP, pages 221–225, Shanghai, China, 2016
K. Qian, C. Janott, V. Pandit, Z. Zhang, C. Heiser, W. Hohenhorst, M. Herzog, W. Hemmert, and B. Schuller. Classification of the excitation location of snore sounds in the upper airway by acoustic multifeature analysis. IEEE Transactions on Biomedical Engineering, 64(8):1731–1741, 2017
Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, and Björn Schuller. Automatic classification of autistic child vocalisations: A novel database and results. In Proc. of INTERSPEECH, pages 849–853, Stockholm, Sweden, 2017
A. Mesaros, T. Heittola, and T. Virtanen. TUT database for acoustic scene classification and sound event detection. In Proc. EUSIPCO, pages 1128–1132, Budapest, Hungary, 2016
Marc C Green and Damian Murphy. Acoustic scene classification using spatial features. In Proc. DCASE Workshop, pages 42–45, Munich, Germany, 2017
D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley. Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3):16–34, 2015
J. T. Geiger, B. Schuller, and G. Rigoll. Large-scale audio feature extraction and svm for acoustic scene classification. In Proc. WASPAA Workshop, pages 1–4, New Paltz, NY, USA, 2013
A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. Dcase 2017 challenge setup: tasks, datasets and baseline system. In Proc. DCASE Workshop, pages 85–92, Munich, Germany, 2017
Stephane Mallat. A wavelet tour of signal processing: the sparse way. Elsevier, Burlington, MA, USA, 2009
Haibo He and Janusz A Starzyk. A self-organizing learning array system for power quality classification based on wavelet transform. IEEE Transactions on Power Delivery, 21(1):286–295, 2006
Andrew Keong Ng, Tong San Koh, Udantha Ranjith Abeyratne, and Kathiravelu Puvanendran. Investigation of obstructive sleep apnea using nonlinear mode interactions in nonstationary snore signals. Annals of Biomedical Engineering, 37(9):1796–1806, 2009
David Li, Jason Tam, and Derek Toub. Auditory scene classification using machine learning techniques. 2013
K. Qian, Z. Ren, V. Pandit, Z. Yang, Z. Zhang, and B. Schuller. Wavelets revisited for the classification of acoustic scenes. In Proc. DCASE Workshop, pages 108–112, Munich, Germany, 2017
M. Valenti, S. Squartini, A. Diment, G. Parascandolo, and T. Virtanen. A convolutional neural network approach for acoustic scene classification. In Proc. IJCNN, pages 1547–1554, Anchorage, AK, USA, 2017
Z. Ren, V. Pandit, K. Qian, Z. Yang, Z. Zhang, and B. Schuller. Deep sequential image features for acoustic scene classification. In Proc. DCASE Workshop, pages 113–117, Munich, Germany, 2017
Seongkyu Mun, Sangwook Park, David K Han, and Hanseok Ko. Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. In Proc. DCASE Workshop, pages 93–102, Munich, Germany, 2017
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012
D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In Proc. CVPR, pages 3642–3649, Providence, RI, USA, 2012
Jee-Weon Jung, Hee-Soo Heo, IL-Ho Yang, Sung-Hyun Yoon, Hye-Jin Shim, and Ha-Jin Yu. Dnn-based audio scene classification for dcase 2017: Dual input features, balancing cost, and stochastic data duplication. In Proc. DCASE Workshop, pages 59–63, Munich, Germany, 2017
Florian Eyben. Real-time speech and music classification by large audio feature space extraction. Springer, Switzerland, 2015
Ronald R Coifman, Yves Meyer, and Victor Wickerhauser. Wavelet analysis and signal processing. In Wavelets and their Applications, pages 153–178, Sudbury, 1992. MA: Jones and Barlett
Rami N. Khushaba. Application of Biosignal-driven Intelligent Systems for Multifunction Prosthesis Control. University of Technology Sydney, Sydney, Australia, 2010. Doctoral Thesis
Rami N Khushaba, Sarath Kodagoda, Sara Lal, and Gamini Dissanayake. Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm. IEEE Transactions on Biomedical Engineering, 58(1):121–131, 2011
Gardner, M.W., Dorling, S.R.: Artificial neural networks (themultilayer perceptron)–a review of applications in the atmosphericsciences. Atmospheric Environment 32(14), 2627–2636 (1998)
M. L. Seltzer, D. Yu, and Y. Wang. An investigation of deep neural networks for noise robust speech recognition. In Proc. ICASSP, pages 7398–7402, Vancouver, BC, Canada, 2013
S. K. Pal and S. Mitra. Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks, 3(5):683–697, 1992
Tobias Scheffer, Christian Decomain, and Stefan Wrobel. Active hidden markov models for information extraction. In Proc. IDA, pages 309–318, Cascais, Portugal, 2001
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org
Murray R Spiegel, John J Schiller, R Alu Srinivasan, and Mike LeVan. Probability and Statistics. McGraw-Hill, New York, NY, USA, 2009
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, Z., Qian, K., Ren, Z., Baird, A., Zhang, Z., Schuller, B. (2020). Learning Multi-Resolution Representations for Acoustic Scene Classification via Neural Networks. In: Li, H., Li, S., Ma, L., Fang, C., Zhu, Y. (eds) Proceedings of the 7th Conference on Sound and Music Technology (CSMT). Lecture Notes in Electrical Engineering, vol 635. Springer, Singapore. https://doi.org/10.1007/978-981-15-2756-2_11
Download citation
DOI: https://doi.org/10.1007/978-981-15-2756-2_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2755-5
Online ISBN: 978-981-15-2756-2
eBook Packages: EngineeringEngineering (R0)