Relational recurrent neural networks for polyphonic sound event detection

Ma, Junbo; Wang, Ruili; Ji, Wanting; Zheng, Hao; Zhu, En; Yin, Jianping

doi:10.1007/s11042-018-7142-7

Relational recurrent neural networks for polyphonic sound event detection

Published: 09 January 2019

Volume 78, pages 29509–29527, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Junbo Ma^1,2,
Ruili Wang^1,3,
Wanting Ji^1,3,
Hao Zheng⁴,
En Zhu⁵ &
…
Jianping Yin⁶

722 Accesses
9 Citations
Explore all metrics

Abstract

A smart environment is one of the application scenarios of the Internet of Things (IoT). In order to provide a ubiquitous smart environment for humans, a variety of technologies are developed. In a smart environment system, sound event detection is one of the fundamental technologies, which can automatically sense sound changes in the environment and detect sound events that cause changes. In this paper, we propose the use of Relational Recurrent Neural Network (RRNN) for polyphonic sound event detection, called RRNN-SED, which utilized the strength of RRNN in long-term temporal context extraction and relational reasoning across a polyphonic sound signal. Different from previous sound event detection methods, which rely heavily on convolutional neural networks or recurrent neural networks, the proposed RRNN-SED method can solve long-lasting and overlapping problems in polyphonic sound event detection. Specifically, since the historical information memorized inside RRNNs is capable of interacting with each other across a polyphonic sound signal, the proposed RRNN-SED method is effective and efficient in extracting temporal context information and reasoning the unique relational characteristic of the target sound events. Experimental results on two public datasets show that the proposed method achieved better sound event detection results in terms of segment-based F-score and segment-based error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M et al (2016) "Tensorflow: a system for large-scale machine learning." In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 16. 265-283
Sharath A, Virtanen T (2017) "A report on sound event detection with different binaural features." arXiv preprint arXiv:1710.02997
Adavanne S, G Parascandolo, P Pertilä, T Heittola, T Virtanen (2016) “Sound event detection in multichannel audio using spatial and harmonic features,” IEEE Detection and Classification of Acoustic Scenes and Events workshop
Adavanne S, G Parascandolo, P Pertilä, T Heittola, T Virtanen (2017a) "Sound event detection in multichannel audio using spatial and harmonic features." arXiv preprint arXiv:1706.02293
Adavanne S, P Pertilä, T Virtanen (2017b) "Sound event detection using spatial features and convolutional recurrent neural network." In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 771-775. IEEE
Cakır E, T Virtanen (2018) "End-to-End polyphonic sound event detection using convolutional recurrent neural networks with learned time-frequency representation input.". In Neural Networks (IJCNN), 2018 International Joint Conference on, pp. 1-7. IEEE
Cakir E, T Heittola, H Huttunen, T Virtanen (2015) "Polyphonic sound event detection using multi label deep neural networks." In Neural Networks (IJCNN), 2015 International Joint Conference on, pp. 1-7. IEEE
Chen Y, Y Zhang, Z Duan (2017) "DCASE2017: sound event detection using convolutional neural networks." DCASE2017 Challenge, Tech. Rep
Dang A, TH Vu, J-C Wang (2017a) "A survey of deep learning for polyphonic sound event detection." In Orange Technologies (ICOT), 2017 International Conference on, pp. 75-78. IEEE
Dang A, TH Vu, J-C Wang (2017b) "Deep learning for DCASE2017 challenge." Detection and Classification of Acoustic Scenes and Events (DCASE 2017) Proceedings 2017
Heittola T, Mesaros A, Eronen A, Virtanen T (2013) "Context-dependent sound event detection" EURASIP J Audio, Speech, Music Proc 2013(1):1
Article Google Scholar
Ioffe S, C Szegedy (2015) "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167
Jeong Il-Y, S Lee, Y Han, and K Lee (2017) "Audio event detection using multiple-input convolutional neural network." Detection and Classification of Acoustic Scenes and Events (DCASE)
Ji W, R Wang, J Ma (2018) "Dictionary-based active learning method for sound event classification." Multimedia tools and applications
Kingma DP, J Ba (2014) "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980
Kroos C, M Plumbley (2017) "Neuroevolution for sound event detection in real life audio: A pilot study." Detection and Classification of Acoustic Scenes and Events (DCASE 2017) Proceedings 2017
Lai Y-H, C-H Wang, S-Y Hou, B-Y Chen, Y Tsao, Y-W Liu (2016) "DCASE report for task 3: Sound event detection in real life audio." IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events
Li P, Chen Z, Yang LT, Zhang Q, Jamal Deen M (2018) "Deep convolutional computation model for feature learning on big data in Internet of Things." IEEE Trans Ind Inform 14(2):790–798
Article Google Scholar
Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R (2014) "Dropout: a simple way to prevent neural networks from overfitting." J Machine Learning Res 15, pp. 1929–1958
Mahdavinejad, M Saeid, M Rezvan, M Barekatain, P Adibi, P Barnaghi, and AP Sheth (2017) "Machine learning for Internet of Things data analysis: A survey." Digital Communications and Networks
Mesaros A, T Heittola, A Eronen, T Virtanen (2010) "Acoustic event detection in real life recordings." In Signal Processing Conference, 2010 18th European, pp. 1267-1271. IEEE
Mesaros A, T Heittola, O Dikmen, T Virtanen (2015) "Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations." In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 151-155. IEEE
Mesaros A, T Heittola, T Virtanen (2016a) "TUT database for acoustic scene classification and sound event detection." In Signal Processing Conference (EUSIPCO), 2016 24th European, pp. 1128-1132. IEEE
Mesaros A, Heittola T, Virtanen T (2016b) "Metrics for polyphonic sound event detection." Appl Sci 6(6):162
Article Google Scholar
Mohammadi M, Al-Fuqaha A, Sorour S, Guizani M (2018) "Deep learning for IoT big data and streaming analytics: A survey." IEEE Commun Surv Tutor
Morrison D, R Wang, LC De Silva (2005a) "Spoken affect classification using neural networks." In Granular Computing, 2005 IEEE International Conference on, vol. 2, pp. 583-586. IEEE
Morrison D, R Wang, LC De Silva, WL Xu (2005b) "Real-time spoken affect classification and its application in call-centres." In Information Technology and Applications, 2005. ICITA 2005. Third International Conference on, vol. 1, pp. 483-487. IEEE
Ozer I, Ozer Z, Findik O (2018) "Noise robust sound event classification with convolutional neural network." Neurocomputing 272:505–512
Article Google Scholar
Parascandolo G, H Huttunen, T Virtanen (2016) "Recurrent neural networks for polyphonic sound event detection in real life recordings." In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pp. 6440-6444. IEEE
Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) "Convolutional recurrent neural networks for polyphonic sound event detection." IEEE/ACM Trans Audio, Speech, Lang Proc 25(6):1291–1303
Article Google Scholar
Phan H, M Krawczyk-Becker, T Gerkmann, A Mertins (2017) "DNN and CNN with weighted and multi-task loss functions for audio event detection." arXiv preprint arXiv:1708.03211
Poliner GE, Ellis DPW (2006) "A discriminative model for polyphonic piano transcription." EURASIP J Adv Sign Proc 2007(1):048317
Article MATH Google Scholar
Santoro A, R Faulkner, D Raposo, J Rae, M Chrzanowski, T Weber, D Wierstra, O Vinyals, R Pascanu, T Lillicrap (2018) "Relational recurrent neural networks." arXiv preprint arXiv:1806.01822
Schmidhuber J (2015) "Deep learning in neural networks: An overview." Neural Netw 61:85–117
Article Google Scholar
Sharath A, A Politis, T Virtanen (2018) "Multichannel sound event detection using 3D convolutional neural networks for learning inter-channel features." arXiv preprint arXiv:1801.09522
Stojkoska, Risteska BL, Trivodaliev KV (2017) "A review of Internet of Things for smart home: Challenges and solutions." J Clean Prod 140:1454–1464
Article Google Scholar
Vaswani A, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, Ł Kaiser, I Polosukhin (2017) "Attention is all you need." In Advances in Neural Information Processing Systems, pp. 6000-6010
Vu TH, Wang J-C (2016) "Acoustic scene and event recognition using recurrent neural networks." Detection and Classification of Acoustic Scenes and Events 2016
Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C (2018) "Review on mining data from multiple data sources." Pattern Recognition Letters
Yang J, He S, Lin Y, Lv Z (2017) "Multimedia cloud transmission and storage system based on Internet of Things." Multimed Tools Appl 76(17):17735–17750
Article Google Scholar
Zhang H, McLoughlin IV, Song Y (2016) "Robust Sound Event Detection in Continuous Audio Environments." In Interspeech, pp. 2977-2981
Zhou J (2017) "Sound event detection in multichannel audio LSTM network." DCASE2017 Challenge, Tech. Rep

Download references

Acknowledgments

This work is partially supported by the National Key R&D Program of China (2018YFB1003203), the Natural Science Foundation of Zhejiang Province (No. LY18F010008), the National Science Foundation of China (No. 61672528, 61773392), and the Marsden Fund of New Zealand.

Author information

Authors and Affiliations

Massey University, Auckland, New Zealand
Junbo Ma, Ruili Wang & Wanting Ji
School of Computer, National University of Defense Technology, Changsha, China
Junbo Ma
Zhejiang Gongshang University, Hangzhou, China
Ruili Wang & Wanting Ji
College of information engineering, Nanjing Xiaozhuang University, Nanjing, China
Hao Zheng
School of Computer, National University of Defense Technology, Changsha, China
En Zhu
Dongguan University of Technology, Dongguan, China
Jianping Yin

Authors

Junbo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ruili Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wanting Ji
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
En Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ruili Wang or Wanting Ji.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Wang, R., Ji, W. et al. Relational recurrent neural networks for polyphonic sound event detection. Multimed Tools Appl 78, 29509–29527 (2019). https://doi.org/10.1007/s11042-018-7142-7

Download citation

Received: 01 July 2018
Revised: 01 December 2018
Accepted: 27 December 2018
Published: 09 January 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-018-7142-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relational recurrent neural networks for polyphonic sound event detection

Abstract

Access this article

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation