A review of deep learning techniques in audio event recognition (AER) applications

Prashanth, Arjun; Jayalakshmi, S. L.; Vedhapriyavadhana, R.

doi:10.1007/s11042-023-15891-z

A review of deep learning techniques in audio event recognition (AER) applications

Published: 14 June 2023

Volume 83, pages 8129–8143, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Arjun Prashanth¹,
S. L. Jayalakshmi² &
R. Vedhapriyavadhana¹

681 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In our day-to-day life, observation of human and social actions are highly important for public protection and security. Additionally, identifying suspicious activity is also essential in critical environments, such as industry, smart homes, nursing homes, and old age homes. In most of the audio-based applications, the Audio Event Recognition (AER) task plays a vital role to recognize audio events. Even though many approaches focus on the effective implementation of audio-based applications, still there exist major research problems such as overlapping events, the presence of background noise, and the lack of benchmark data sets. The main objective of this survey is to identify effective feature extraction methods, robust classifiers, and benchmark datasets. To achieve this, we have presented a detailed survey on features, deep learning classifiers, and data sets used in the AER applications. Also, we summarised the various methods involved in AER applications such as audio spoofing, audio surveillance, and audio fingerprinting. The future direction includes setting up a benchmark dataset, identifying the semantic features, and exploring the transfer learning-based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hand-crafted versus learned representations for audio event detection

Article 07 April 2022

A Survey: Neural Network-Based Deep Learning for Acoustic Event Detection

Article 21 March 2019

Multi-rate modulation encoding via unsupervised learning for audio event detection

Article Open access 01 April 2024

Data availability

Data will be made available on reasonable request.

References

Abbasi A, Javed ARR, Yasin A, Jalil Z, Kryvinska N, Tariq U (2022) A large-scale benchmark dataset for anomaly detection and rare event classification for audio forensics. IEEE Access 10:38885–38894
Article Google Scholar
Achyut Mani Tripathi and Om Jee Pandey (2023) Divide and distill: new outlooks on knowledge distillation for environmental sound classification. IEEEACM Trans Audio, Speech, Language Process 31:1100–1113
Article Google Scholar
Alim SA, Rashid NKA (2018) Some commonly used speech feature extraction algorithms. In: Lopez-Ruiz R (ed) From natural to artificial intelligence, chapter 1. IntechOpen, Rijeka
Google Scholar
Altalbe A (2021) Audio fingerprint analysis for speech processing using deep learning method. Int J Speech Technol:1–7
Alzantot M, Wang Z, Srivastava MB (2019) Deep residual neural networks for audio spoofing detection. arXiv preprint arXiv:1907.00501
Bandara M, Jayasundara R, Ariyarathne I, Meedeniya D, Perera C (2023) Forest sound classification dataset: Fsc22. Sensors 23(4):2032
Article Google Scholar
Bhatti UA, Yuan L, Zhaoyuan Y, Nawaz SA, Mehmood A, Bhatti MA, Nizamani MM, Xiao S et al (2021) Predictive data modeling using sp-knn for risk factor evaluation in urban demographical healthcare data. J Med Imaging Health Inform 11(1):7–14
Article Google Scholar
Chandrakala S, Jayalakshmi SL (2019) Environmental audio scene and sound event recognition for autonomous surveillance: a survey and comparative studies. ACM Comput Surv (CSUR) 52(3):1–34
Colangelo F, Battisti F, Carli M, Neri A, Calabró F (2017) Enhancing audio surveillance with hierarchical recurrent neural networks. In 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), pages 1–6. IEEE
Drossos K, Adavanne S, Virtanen T (2017) Automated audio captioning with recurrent neural networks. In IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), new Paltz, New York, USA
Fang Y, Liu D, Jiang Z, Wang H et al (2023) Monitoring of sleep breathing states based on audio sensor utilizing mel-scale features in home healthcare. J Healthcare Eng 2023
Gao L, Kele X, Wang H, Peng Y (2022) Multi-representation knowledge distillation for audio classification. Multimed Tools Appl 81(4):5089–5112
Article Google Scholar
Greco A, Petkov N, Saggese A, Vento M (2020) Aren: a deep learning approach for sound event recognition using a brain inspired representation. IEEE Trans Inform Forensics Sec 15:3610–3624
Article Google Scholar
Greco A, Saggese A, Vento M, Vigilante V (2019) Sorenet: a novel deep network for audio surveillance applications. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pages 546–551
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. ProceedIEEE Conf Comput Vision Pattern Recogn:770–778
Inik O (2023) Cnn hyper-parameter optimization for environmental sound classification. Appl Acoust 202:109168
Article Google Scholar
Jiang Z, Soldati A, Schamberg I, Lameira AR, Moran S (2023) Automatic sound event detection and classification of great ape calls using neural networks. arXiv preprint arXiv:2301.02214
Küçükbay SE, Kalkan S et al (2022) Hand-crafted versus learned representations for audio event detection. Multimed Tools Appl:1–20
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019
Mnasri Z, Rovetta S, Masulli F (2020) Audio surveillance of roads using deep learning and autoencoder-based sample weight initialization. In 2020 IEEE 20th Mediterranean Electrotechnical Conference ( MELECON), pages 99–103
Mnasri Z, Rovetta S, Masulli F (2022) Anomalous sound event detection: a survey of machine learning based methods and applications. Multimed Tools Appl 81(4):5537–5586
Article Google Scholar
Mohaimenuzzaman M, Bergmeir C, West I, Meyer B (2023) Environmental sound classification on the edge: a pipeline for deep acoustic networks on extremely resource constrained devices. Pattern Recogn 133:109025
Article Google Scholar
Mustafa A, Qamhan, Altaheri H, Meftah AH, Muhammad G, Alotaibi YA (2021) Digital audio forensics. Microphone and environment classification using deep learning. IEEE Access 9:62719–62733
Article Google Scholar
Poorjam AH (2018) Why we take only 12-13 mfcc coefficients in feature extraction?, 05
Purwins H, Li B, Virtanen T, Schluter J, Chang S-Y, Sainath T (2019) Deep learning for audio signal processing. IEEE J Selected Topics Signal Process 13(2):206–219
Article Google Scholar
Ray R, Karthik S, Mathur V, Prashant Kumar G Maragatham ST, Shankarappa RT (2021) Feature genuinization based residual squeeze-and-excitation for audio anti-spoofing in sound ai. In 2021 12th international conference on computing communication and networking technologies (ICCCNT), pages 1–5. IEEE
Renaud J, Karam R, Salomon M, Couturier R (2023) Deep learning and gradient boosting for urban environmental noise monitoring in smart cities. Expert Syst Appl:119568
Revay S, Teschke M (2019) Multiclass language identification using deep learning on spectral images of audio signals. CoRR, abs/1905.04348
Shaer I, Shami A , (2022) Sound event classification in an industrial environment: Pipe leakage detection use case. arXiv preprint arXiv:2205.02706
Shim H-J, Jung J-W, Heo H-S, Yoon S-H, Ha-Jin Y (2018) Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes. In 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI), pages 172–176
Shi Q, Deng S, Han J (2022) Common subspace learning based semantic feature extraction method for acoustic event recognition. Appl Acoust 190:108638
Article Google Scholar
Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley MD (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimedia 17(10):1733–1746
Article Google Scholar
Stowell D, Plumbley MD (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2:e488
Article Google Scholar
Stowell D, Wood MD, Pamuła H, Stylianou Y, Glotin H (2019) Automatic acoustic detection of birds through deep learning: the first bird audio detection challenge. Methods Ecol Evol 10(3):368–380
Article Google Scholar
Su C, Huang H-Y, Shi S, Guo Y, Wu H (2017) A parallel recurrent neural network for language modeling with pos tags. In Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation, pages 140–147
Todisco M, Delgado H, Evans N (2017) Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang 45:516–535
Article Google Scholar
Turab M, Kumar T, Bendechache M, Saber T (2022) Investigating multi-feature selection and ensembling for audio classification. arXiv preprint arXiv:2206.07511
Venkatesh S, Moffat D, Miranda ER (2022) You only hear once: a yolo-like algorithm for audio segmentation and sound event detection. Appl Sci 12(7):3293
Article Google Scholar
Xu Y, Kong Q, Huang Q, Wang W, Plumbley MarkD (2017) Convolutional gated recurrent neural network incorporating spatial features for audio tagging. In 2017 international joint conference on neural networks (IJCNN), pages 3461–3466. IEEE
Zhao Y, Xia X, Togneri R (2019) Applications of deep learning to audio generation. IEEE Circ Syst Magaz 19(4):19–38
Article Google Scholar

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
Arjun Prashanth & R. Vedhapriyavadhana
School of Engineering and Technology, Department of Computer Science, Pondicherry University (Main Campus), Puducherry, India
S. L. Jayalakshmi

Authors

Arjun Prashanth
View author publications
You can also search for this author in PubMed Google Scholar
S. L. Jayalakshmi
View author publications
You can also search for this author in PubMed Google Scholar
R. Vedhapriyavadhana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. L. Jayalakshmi.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Prashanth, A., Jayalakshmi, S.L. & Vedhapriyavadhana, R. A review of deep learning techniques in audio event recognition (AER) applications. Multimed Tools Appl 83, 8129–8143 (2024). https://doi.org/10.1007/s11042-023-15891-z

Download citation

Received: 15 January 2022
Revised: 17 May 2023
Accepted: 22 May 2023
Published: 14 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15891-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of deep learning techniques in audio event recognition (AER) applications

Abstract

Access this article

Similar content being viewed by others

Hand-crafted versus learned representations for audio event detection

A Survey: Neural Network-Based Deep Learning for Acoustic Event Detection

Multi-rate modulation encoding via unsupervised learning for audio event detection

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A review of deep learning techniques in audio event recognition (AER) applications

Abstract

Access this article

Similar content being viewed by others

Hand-crafted versus learned representations for audio event detection

A Survey: Neural Network-Based Deep Learning for Acoustic Event Detection

Multi-rate modulation encoding via unsupervised learning for audio event detection

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation