Human auditory model based real-time smart home acoustic event monitoring

Mondal, Sujoy; Barman, Abhirup Das

doi:10.1007/s11042-021-11455-1

Human auditory model based real-time smart home acoustic event monitoring

Published: 18 September 2021

Volume 81, pages 887–906, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

321 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

In this work Gammatone (GT) filter bank energy features are used with a deep neural network (GT-DNN) to model robust acoustic event detection (AED) in the smart home environment for monitoring human activities. The Gammatone filter bank is modelled for the human auditory system which decomposes the environmental sound events into multiple frequency bands energy features. These features are learned during the training phase of DNN which is similar to the AED task by a human brain. Gammatone filter bank energy features are found superior over popular Mel-scale filter bank features and Gammatone filter bank output provides smooth spectrogram patterns that help to identify the dominant characteristic features of the target events. Moreover, the auditory feature-based Gammatone filter bank approach showed its robustness against the noise compared to Mel-scale filter bank features. In this work, the proposed GT-DNN model is tested on a single board computer (SBC) prototype developed on a popular Raspberry Pi 4 Model B. Experimental F-score results show impressive real-time AED performance. Furthermore, different parameters of the model are optimised and it is used to classify the various acoustic events from Freiburg-106 event dataset. NOISEX-92 dataset is combined with the clean event dataset which is used to train the model. Comparison of AED performance in terms of F-score at different signal to noise ratios (SNRs) is carried out between GT-DNN and baseline Mel-scale filter bank energy features, and improved results are obtained with the GT-DNN method. Moreover, the confidence scores for 10 different classes of events are evaluated in presence of worst category babble noise at 0 dB SNR and excellent classification results are obtained using the proposed method. Detailed analysis and results are given in support of each claim.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

References

Akhtar Z, Falk TH (2017) Audio-visual multimedia quality assessment: a comprehensive survey. IEEE Access 5:21090–21117
Article Google Scholar
Al-Karawi KA, Mohammed DY (2021) Improving short utterance speaker verification by combining mfcc and entrocy in noisy conditions. Multimed Tools Appl 80(14):22231–22249
Article Google Scholar
Baker MR, Patil RB (1998) Universal approximation theorem for interval neural networks. Reliab Comput 4(3):235–239
Article MathSciNet Google Scholar
Boddapati V, Petef A, Rasmusson J, Lundberg L (2017) Classifying environmental sounds using image recognition networks. Procedia Comput Sci 112:2048–2056
Article Google Scholar
Casasanta G, Petenko I, Mastrantonio G, Bucci S, Conidi A, Di Lellis AM, Sfoglietti G, Argentini S (2018) Consumer drones targeting by sodar (acoustic radar). IEEE Geosci Remote Sens Lett 15(11):1692–1694
Article Google Scholar
Chandrakala S, Jayalakshmi SL (2019) Environmental audio scene and sound event recognition for autonomous surveillance: a survey and comparative studies. ACM Comput Surv (CSUR) 52(3):1–34
Article Google Scholar
Derczynski L (2016) Complementarity, f-score, and nlp evaluation. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), pp 261–266
Du X, El-Khamy M, Lee J, Davis L (2017) Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 953–961
Er PV, Tan KK (2018) Non-intrusive fall detection monitoring for the elderly based on fuzzy logic. Measurement 124:91–102
Article Google Scholar
Fayek HM (2016) Speech processing for machine learning: filter banks mel-frequency cepstral coefficients (mfccs) and what’s in-between
Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2015) Reliable detection of audio events in highly noisy environments. Pattern Recogn Lett 65:22–28
Article Google Scholar
Greco A, Petkov N, Saggese A, Vento M (2020) Aren: a deep learning approach for sound event recognition using a brain inspired representation. In: IEEE transactions on information forensics and security
Imoto K (2018) Introduction to acoustic event and scene analysis. Acoust Sci Technol 39(3):182–188
Article Google Scholar
Khattree R, Naik DN (2002) Andrews plots for multivariate data: some new suggestions and applications. J Stat Plan Inference 100(2):411–425
Article MathSciNet Google Scholar
Kiktova-Vozarikova E, Juhar J, Cizmar A (2015) Feature selection for acoustic events detection. Multimed Tools Appl 74(12):4213–4233
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Komatsu T, Toizumi T, Kondo R, Senda Y (2016) Acoustic event detection method using semi-supervised non-negative matrix factorization with a mixture of local dictionaries. In: Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), pp 45–49
Krishnamurthy N, Hansen JHL (2009) Babble noise: modeling, analysis, and applications. IEEE Trans Audio Speech Lang Process 17(7):1394–1407
Article Google Scholar
Lee D, Lee S, Han Y, Lee K (2017) Ensemble of convolutional neural networks for weakly-supervised sound event detection using multiple scale input. In: Detection and classification of acoustic scenes and events (DCASE)
Li E, Zhou Z, Chen X (2018) Edge intelligence: on-demand deep learning model co-inference with device-edge synergy. In: Proceedings of the 2018 workshop on mobile edge communications, pp 31–36
Lozano-Diez A, Zazo R, Toledano DT, Gonzalez-Rodriguez J (2017) An analysis of the influence of deep neural network (dnn) topology in bottleneck feature based language recognition. Plos One 12(8):e0182580
Article Google Scholar
Ma J, Wang R, Ji W, Zheng H, Zhu E, Yin J (2019) Relational recurrent neural networks for polyphonic sound event detection. Multimed Tools Appl 78(20):29509–29527
Article Google Scholar
McLoughlin I, Zhang H, Xie Z, Song Y, Xiao W (2015) Robust sound event classification using deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 23(3):540–552
Article Google Scholar
Mondal S, Barman AD (2020) Speech activity detection using time-frequency auditory spectral pattern. Appl Acoust 167:107403
Article Google Scholar
Moore BCJ, Glasberg BR (1983) Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J Acoust Soc Am 74(3):750–753
Article Google Scholar
Mqtt: The standard for iot messaging. https://mqtt.org/
Mulimani M, Koolagudi SG (2019) Segmentation and characterization of acoustic event spectrograms using singular value decomposition. Expert Syst Appl 120:413–425
Article Google Scholar
Patterson RD, Nimmo-Smith I, Holdsworth J, Rice P (1987) An efficient auditory filterbank based on the gammatone function. In: A meeting of the IOC Speech Group on auditory modelling at RSRE, vol 2
Piczak KJ (2015) Esc: dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1015–1018
Proakis JG, Manolakis DG (2004) Digital signal processing. PHI Publication, New Delhi
Google Scholar
Samanta A, Saha A, Satapathy SC, Fernandes SL, Zhang Y -D (2020) Automated detection of diabetic retinopathy using convolutional neural networks on a small dataset. Pattern Recognit Lett 135:293–298
Article Google Scholar
Sharan RV, Moir TJ (2019) Acoustic event recognition using cochleagram image and convolutional neural networks. Appl Acoust 148:62–66
Article Google Scholar
Slaney M et al (1993) An efficient implementation of the patterson-holdsworth auditory filter bank. Apple Computer, Perception Group, Tech. Rep, 35(8)
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Stephane M (1999) A wavelet tour of signal processing. The Sparse Way
Stork JA, Spinello L, Silva J, Arras KO (2012) Audio-based human activity recognition using non-markovian ensemble voting. In: 2012 IEEE RO-MAN: the 21st IEEE international symposium on robot and human interactive communication. IEEE, pp 509–514
Upc-talp database of isolated meeting-room acoustic events. http://catalog.elra.info/en-us/repository/browse/ELRA-S0268/
Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251
Article Google Scholar
Wang DL, Brown GJ (2006) Computational auditory scene analysis: principles, algorithms, and applications. Wiley-IEEE Press
Wang C -Y, Wang J -C, Santoso A, Chiang C -C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Trans Audio Speech Lang Process 26(8):1336–1351
Article Google Scholar
Wang W, Yuan X, Wu X, Liu Y (2017) Fast image dehazing method based on linear transformation. IEEE Trans Multimed 19(6):1142–1155
Article Google Scholar
Xia X, Togneri R, Sohel F, Zhao Y, Huang D (2019) A survey: neural network-based deep learning for acoustic event detection. Circ Syst Signal Process 38(8):3433–3453
Article Google Scholar
Yegnanarayana B (2009) Artificial neural networks. PHI Learning Pvt Ltd.
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inf Process Syst 31:8778–8788
Google Scholar
Zhao X, Wang DL (2013) Analyzing noise robustness of mfcc and gfcc features in speaker identification. In: IEEE international conference on acoustics, speech and signal processing. IEEE, p 2013

Download references

Author information

Authors and Affiliations

Department of ECE, RCC Institute of Information Technology, Kolkata, India
Sujoy Mondal
Institute of Radio Physics & Electronics, University of Calcutta, Kolkata, India
Abhirup Das Barman

Authors

Sujoy Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Abhirup Das Barman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sujoy Mondal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mondal, S., Barman, A.D. Human auditory model based real-time smart home acoustic event monitoring. Multimed Tools Appl 81, 887–906 (2022). https://doi.org/10.1007/s11042-021-11455-1

Download citation

Received: 09 March 2021
Revised: 18 August 2021
Accepted: 19 August 2021
Published: 18 September 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11042-021-11455-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human auditory model based real-time smart home acoustic event monitoring

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human auditory model based real-time smart home acoustic event monitoring

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation