Skip to main content
Log in

Source identification of weak audio signals using attention based convolutional neural network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Determining the source of a weak sound signal can be difficult, particularly in busy or noisy surroundings. The hand-engineered characteristics and algorithms used in traditional methods for source separation and localization are irresistible to changes in the signal or the noise. In environmental sound classification(ESC) the effectiveness of representative features collected from the environmental sounds is crucial to the classification performance. However, semantically meaningless frames and silent frames frequently hinder the performance of ESC. In order to address this problem, we proposed a new context-aware attention-based neural network for weak environmental sound source identification. Our method is based on the idea that the attention mechanism will be able to concentrate on important elements of the input signal and suppress irrelevant ones, enabling the network to locate the weak sound’s source more effectively. To solve the limited capacity problem faced by the attention maps of the attention model we are using context information as additional input. The identification of weak signals and finding corresponding context information is also a challenging problem because of the degradation or noise in the signal. To solve this issue we proposed a novel algorithm based on MFCC for context feature generation. Additionally, the robustness and generalizability of the classification model are improved by using multiple feature extraction techniques which reduces the reliance on any single feature extraction technique. We test our methodology with experiments on datasets of simulated weak sound signals with varying amounts of noise and clutter. We evaluated the performance of our attention-based neural network in comparison to a number of established techniques. Our findings demonstrate that, especially in noisy and congested environments, the proposed model outperforms the baselines in terms of source identification accuracy. Overall, our work illustrates the efficiency of employing an attention-based neural network based on context information for the identification of weak sound sources and implies that this strategy may be a promising approach for further research in this area.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

1. UrbanSound8k : https://urbansounddataset.weebly.com/urbansound8k.html

2. ESC : https://github.com/karoldvl/paper-2015-esc-dataset

Notes

  1. https://urbansounddataset.weebly.com/urbansound8k.html

  2. https://github.com/karoldvl/paper-2015-esc-dataset

  3. http://dx.doi.org/10.7910/DVN/YDEPUT

References

  1. Ahmad S et al (2020) Environmental sound classification using optimum allocation sampling based empirical mode decomposition. Physica A: Stat Mech App 537:122613. https://www.sciencedirect.com/science/article/pii/S0378437119314955. https://doi.org/10.1016/j.physa.2019.122613

  2. Akbal E (2020) An automated environmental sound classification methods based on statistical and textural feature. Appl Acoust 167:107413. https://www.sciencedirect.com/science/article/pii/S0003682X2030517X. https://doi.org/10.1016/j.apacoust.2020.107413

  3. Chen Y, Guo Q, Liang X, Wang J, Qian Y (2019) Environmental sound classification with dilated convolutions. Appl Acoust 148:123–132. https://www.sciencedirect.com/science/article/pii/S0003682X18306121. https://doi.org/10.1016/j.apacoust.2018.12.019

  4. Presannakumar K, Mohamed A (2023) Deep learning based source identification of environmental audio signals using optimized convolutional neural networks. Appl Soft Comput 143:110423. https://www.sciencedirect.com/science/article/pii/S1568494623004416. https://doi.org/10.1016/j.asoc.2023.110423

  5. Fang Z, Yin B, Du Z, Huang X (2022) Fast environmental sound classification based on resource adaptive convolutional neural network. Sci Rep 12(1):6599. https://doi.org/10.1038/s41598-022-10382-x

    Article  Google Scholar 

  6. Huang Z et al (2020) Urban sound classification based on 2-order dense convolutional network using dual features. Appl Acoust 164:107243. https://www.sciencedirect.com/science/article/pii/S0003682X19312691. https://doi.org/10.1016/j.apacoust.2020.107243

  7. Piczak KJ (2015) Environmental sound classification with convolutional neural networks 1–6

  8. Mu W, Yin B, Huang X, Xu J, Du Z (2021) Environmental sound classification using temporal-frequency attention based convolutional neural network. Sci Rep 11(1):21552. https://doi.org/10.1038/s41598-021-01045-4

    Article  Google Scholar 

  9. (2019) Mobile crowdsourcing based context-aware smart alarm sound for smart living. Perv Mob Comput 55:32–44. https://www.sciencedirect.com/science/article/pii/S1574119218302104. https://doi.org/10.1016/j.pmcj.2019.02.003

  10. Ma D, Li S, Zhang X, Wang H (2017) Interactive attention networks for aspect-level sentiment classification, IJCAI’17, 4068–4074 (AAAI Press)

  11. Zhang Z, Xu S, Zhang S, Qiao T, Cao S (2021) Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 453:896–903. https://www.sciencedirect.com/science/article/pii/S0925231220313618. https://doi.org/10.1016/j.neucom.2020.08.069

  12. Hong G et al (2021) A multi-scale gated multi-head attention depthwise separable cnn model for recognizing covid-19. Sci Rep 11(1):18048. https://doi.org/10.1038/s41598-021-97428-8

  13. Ren Z et al (2022) Deep attention-based neural networks for explainable heart sound classification. Mach Learn Appl 9:100322. https://www.sciencedirect.com/science/article/pii/S266682702200038X. https://doi.org/10.1016/j.mlwa.2022.100322

  14. Giannakopoulos T, Spyrou E, Perantonis SJ (2019) In: MacIntyre J, Maglogiannis I, Iliadis L, Pimenidis E (eds) Recognition of urban sound events using deep context-aware feature extractors and handcrafted features. (eds MacIntyre J, Maglogiannis I, Iliadis L, Pimenidis E) Artificial Intelligence Applications and Innovations Springer International Publishing

  15. Tripathi AM, Mishra A (2021) Environment sound classification using an attention-based residual neural network. Neurocomputing 460:409–423. https://www.sciencedirect.com/science/article/pii/S0925231221009358. https://doi.org/10.1016/j.neucom.2021.06.031

  16. Jaiswal RK, Dubey RK (2022) Non-intrusive speech quality assessment using context-aware neural networks. Int J Speech Technol 25(4):947–965. https://doi.org/10.1007/s10772-022-10011-y

    Article  Google Scholar 

  17. Li M, Huang W, Zhang T (2022) Attention based convolutional neural network with multi-frequency resolution feature for environment sound classification. Neural Process Lett. https://doi.org/10.1007/s11063-022-11041-y

    Article  Google Scholar 

  18. Zhao M, Jia X (2017) A novel strategy for signal denoising using reweighted svd and its applications to weak fault feature enhancement of rotating machinery. Mech Syst Signal Process 94:129-147

    Article  Google Scholar 

  19. Qin Y, Xing J, Mao Y (2016) Weak transient fault feature extraction based on an optimized morlet wavelet and kurtosis. Meas Sci Technol 27(8):085009

    Article  Google Scholar 

  20. Yu D, Cheng J, Yang Y (2005) Application of emd method and hilbert spectrum to the fault diagnosis of roller bearings. Mech Syst Signal Process 19(2):259–270

    Article  Google Scholar 

  21. Purwins H et al (2019) Deep learning for audio signal processing. IEEE J Sel Top Sig Process 13(2):206–219. https://doi.org/10.1109/JSTSP.2019.2908700

    Article  Google Scholar 

  22. Abdulbaqi J et al (2020) Speech-based activity recognition for trauma resuscitation, 1–8

  23. Xie J, Colonna JG, Zhang J (2021) Bioacoustic signal denoising: a review. Artif Intell Rev 54(5):3575–3597. https://doi.org/10.1007/s10462-020-09932-4

    Article  Google Scholar 

  24. (2020) Classification of heart sounds using discrete time-frequency energy feature based on s transform and the wavelet threshold denoising. Biomed Sig Process Control 57:101684. https://www.sciencedirect.com/science/article/pii/S1746809419302654. https://doi.org/10.1016/j.bspc.2019.101684

  25. Su Y, Zhang K, Wang J, Zhou D, Madani K (2020) Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl Acoust 158:107050. https://www.sciencedirect.com/science/article/pii/S0003682X19302701. https://doi.org/10.1016/j.apacoust.2019.107050

  26. Zölzer U (2022) Digital Audio Signal Processing, 3rd edn. Wiley

    Google Scholar 

  27. Sharma G, Umapathy K, Krishnan S (2020) Trends in audio signal feature extraction methods. Appl Acoust 158:107020. https://www.sciencedirect.com/science/article/pii/S0003682X19308795. https://doi.org/10.1016/j.apacoust.2019.107020

  28. Chandrakala S, M V, N S, L JS, (2021) Multi-view representation for sound event recognition. SIViP 15(6):1211–1219. https://doi.org/10.1007/s11760-020-01851-9

  29. Liu C, Hong F, Feng H, Zhai Y, Chen Y (2021) Environmental sound classification based on stacked concatenated dnn using aggregated features. J Signal Process Syst 93(11):1287–1299. https://doi.org/10.1007/s11265-021-01702-x

    Article  Google Scholar 

  30. Li H et al (2023) Environmental sound classification based on car-transformer neural network model. Circ Syst Sig Process. https://doi.org/10.1007/s00034-023-02339-w

    Article  Google Scholar 

  31. Khare SK, Bajaj V, Sengur A, Sinha G (2022) in 10 - classification of mental states from rational dilation wavelet transform and bagged tree classifier using eeg signals (eds Bajaj V, Sinha G) Artificial Intelligence-Based Brain-Computer Interface, 217–235 Academic Press, https://www.sciencedirect.com/science/article/pii/B978032391197900014X

  32. Mirzaei S, Jazani IK (2023) Acoustic scene classification with multi-temporal complex modulation spectrogram features and a convolutional lstm network. Multimedia Tools Appl 82(11):16395–16408. https://doi.org/10.1007/s11042-022-14192-1

    Article  Google Scholar 

  33. Huang Y, Tian K, Wu A, Zhang G (2019) Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J Ambient Intell Humanized Comput 10(5):1787–1798. https://doi.org/10.1007/s12652-017-0644-8

    Article  Google Scholar 

  34. Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of mfcc. J Comput Sci Technol 16(6):582–589. https://doi.org/10.1007/BF02943243

    Article  MATH  Google Scholar 

  35. Noumida A, Rajan R (2022) Multi-label bird species classification from audio recordings using attention framework. Appl Acoust 197:108901. https://www.sciencedirect.com/science/article/pii/S0003682X22002754. https://doi.org/10.1016/j.apacoust.2022.108901

  36. Piczak KJ (2015) ESC: Dataset for Environmental Sound Classification. https://doi.org/10.7910/DVN/YDEPUT

  37. Salamon J, Bello JP (2014) A dataset and taxonomy for urban sound research. PloS one 9(12):e114733

    Google Scholar 

  38. Zhang Z, Xu S, Zhang S, Qiao T, Cao S (2021) Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 453:896–903. https://www.sciencedirect.com/science/article/pii/S0925231220313618. https://doi.org/10.1016/j.neucom.2020.08.069

  39. Piczak KJ (2015) Environmental sound classification with convolutional neural networks, 1–6

  40. Aytar Y, Vondrick C, Torralba A (2016) Soundnet: Learning sound representations from unlabeled video, NIPS’16. Curran Associates Inc., Red Hook, NY, USA, pp 892–900

  41. Demir F, Turkoglu M, Aslan M, Sengur A (2020) A new pyramidal concatenated cnn approach for environmental sound classification. Appl Acoust 170:107520. https://www.sciencedirect.com/science/article/pii/S0003682X20306241. https://doi.org/10.1016/j.apacoust.2020.107520

  42. Li S et al (2018) An ensemble stacked convolutional neural network model for environmental event sound recognition. Appl Sci 8(7). https://www.mdpi.com/2076-3417/8/7/1152. https://doi.org/10.3390/app8071152

  43. Su Y, Zhang K, Wang J, Zhou D, Madani K (2020) Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl Acoust 158:107050. https://www.sciencedirect.com/science/article/pii/S0003682X19302701. https://doi.org/10.1016/j.apacoust.2019.107050

  44. Su Y, Zhang K, Wang J, Madani K (2019) Environment sound classification using a two-stream cnn based on decision-level fusion. Sensors 19(7). https://www.mdpi.com/1424-8220/19/7/1733. https://doi.org/10.3390/s19071733

  45. Wu B, Zhang X-P (2022) Environmental sound classification via time-frequency attention and framewise self-attention-based deep neural networks. IEEE Internet of Things J 9(5):3416–3428. https://doi.org/10.1109/JIOT.2021.3098464

    Article  Google Scholar 

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krishna Presannakumar.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Anuj Mohamed contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Presannakumar, K., Mohamed, A. Source identification of weak audio signals using attention based convolutional neural network. Appl Intell 53, 27044–27059 (2023). https://doi.org/10.1007/s10489-023-04973-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04973-y

Keywords

Navigation