Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

Özcan, Fatma; Alkan, Ahmet

doi:10.1007/s11760-023-02825-3

Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

Original Paper
Published: 31 October 2023

Volume 18, pages 1193–1204, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Fatma Özcan¹ &
Ahmet Alkan¹

160 Accesses
Explore all metrics

Abstract

Recently, work has been done to understand aspects of how CI processes with sound. Here, we use neural temporal correlation in the inferior colliculus for identifying and categorising the sound that was used as a stimulus. The success of the classification gradually deteriorates for shorter durations. We tried to improve these success values with deep learning methods for audio, on processing windows of 62.5 ms, 250 ms and 1000 ms. We demonstrate that 62.5 ms could be an integration time for temporal correlation. The neural data contains sound features that can be easily processed with artificial neural networks dedicated to audio signals. Network architectures dedicated to audio classification, such as Yamnet, Vggish, Openl3, used in transfer learning, give quite quickly neural data classification results with very high accuracy, compared to image classification networks. In the case of unshuffled correlation images, we have the best accuracy. With noiseless shuffled correlation images, we have the best accuracy, such as for 1000 ms: 100%, for 250 ms: 96.7%, for 62.5 ms: 93.8%, obtained with the OpenL3 network. To evaluate the importance of the contributions of the input features of a neural network to its outputs, we use Explainable Artificial Intelligence. We then used three different explicability methods, such as Grad-CAM, LIME and Occlusion Sensitivity to obtain three sensitive maps. Network uses different regions corresponding to a very high or very low correlation to make its prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional Neural Networks for Audio Classification: An Ensemble Approach

Environmental Sound Recognition Using Masked Conditional Neural Networks

Degramnet: effective audio analysis based on a fully learnable time–frequency representation

Article Open access 19 July 2023

Data availability

The data supporting the results of this study are available in “Multi-site neural recordings in the auditory midbrain of unanesthetised rabbits listening to natural texture sounds and sound correlation auditory models” on CRCNS.org [15].

References

De Cheveigné, A.: Structure du Système Auditif (2004)
Driscoll, M.E., Tadi, P.: Neuroanatomy, Inferior Colliculus – StatPearls. NCBI Bookshelf (2021)
Google Scholar
Downer, J.D., Niwa, M., Sutter, M.L.: Task engagement selectively modulates neural correlations in primary auditory cortex. J. Neurosci. 35(19), 7565–7574 (2015). https://doi.org/10.1523/JNEUROSCI.4094-14.2015
Article CAS PubMed PubMed Central Google Scholar
Sadeghi, M., Zhai, X., Stevenson, I.H., Escabí, M.A.: A neural ensemble correlation code for sound category identification. PLoS Biol. (2019). https://doi.org/10.1371/journal.pbio.3000449
Article PubMed PubMed Central Google Scholar
Wiki: Colliculus Inférieur. https://stringfixer.com/fr/Brachium_of_the_inferior_colliculus (2022)
Schnupp, J., Nelken, I., King, A.J.: Auditory Neuroscience: Making Sense of Sound. The MIT Press (2011)
Google Scholar
Heeringa, A.N., van Dijk, P.: Neural coding of the sound envelope is changed in the inferior colliculus immediately following acoustic trauma. Eur. J. Neurosci. 49(10), 1220–1232 (2019). https://doi.org/10.1111/ejn.14299
Article PubMed Google Scholar
Zhai, X., et al.: Distinct neural ensemble response statistics are associated with recognition and discrimination of natural sound textures. Proc. Natl. Acad. Sci. USA (2020). https://doi.org/10.1073/pnas.2005644117/-/DCSupplemental
Article PubMed PubMed Central Google Scholar
Shadlen, M.N., Newsome, W.T.: Neural basis of a perceptual decision in the parietal cortex (Area LIP) of the Rhesus Monkey. J. Neurophysiol. 86(4), 1916 (2001)
Article CAS PubMed Google Scholar
Özcan, F., Alkan, A.: Neural decoding of inferior colliculus multiunit activity for sound category identification with temporal correlation and deep learning. Biorxiv (2022). https://doi.org/10.1101/2022.08.24.505211
Article Google Scholar
Livezey, J.A., Glaser, J.I.: Deep learning approaches for neural decoding: from CNNs to LSTMs and spikes to fMRI. http://arxiv.org/abs/2005.09687 (2020)
Ong, J.H., Goh, K.M., Lim, L.L.: Comparative analysis of explainable artificial intelligence for COVID-19 diagnosis on CXR image. IEEE ICSIPA (2021). https://doi.org/10.1109/ICSIPA52582.2021.9576766
Article Google Scholar
Matlab: Deep Learning—Transfer Learning (2022)
Blackwell, J.M., Lesicko, A., Rao, W., De Biasi, M., Geffen, M.N.: Auditory cortex shapes sound responses in the inferior colliculus. Elife (2020). https://doi.org/10.7554/eLife.51890
Article PubMed PubMed Central Google Scholar
Sadeghi, M., Zhai, X., Stevenson, I.H., Escabi, M.A.: Dataset: multi-site neural recordings in the auditory midbrain of unanesthetized rabbits listening natural texture sounds and sound correlation auditory models (2019)
Kell, A.J., McDermott, J.H.: Deep neural network models of sensory systems: windows onto the role of task constraints. Curr. Opin. Neurobiol. 55, 121–132 (2019). https://doi.org/10.1016/j.conb.2019.02.003
Article CAS PubMed Google Scholar
McKearney, R.M., MacKinnon, R.C.: Objective auditory brainstem response classification using machine learning. Int. J. Audiol. (2019). https://doi.org/10.1080/14992027.2018.1551633
Article PubMed Google Scholar
Bing, D., et al.: Predicting the hearing outcome in sudden sensorineural hearing loss via machine learning models. Clin. Otolaryngol. 43(3), 868–874 (2018). https://doi.org/10.1111/coa.13068
Article CAS PubMed Google Scholar
Shigemoto, N., Stoh, H., Shibata, K., Inoue, Y.: Study of deep learning for sound scale decoding technology from human brain auditory cortex. In: 2019 IEEE 1st Global Conference on Life Sciences and Technologies, LifeTech 2019. Institute of Electrical and Electronics Engineers Inc., pp. 212–213 (2019). https://doi.org/10.1109/LifeTech.2019.8884004
Faisal, A., Nora, A., Seol, J., Renvall, H., Salmelin, R.: Kernel convolution model for decoding sounds from time-varying neural responses. PRNI (2015). https://doi.org/10.1109/PRNI.2015.10
Article Google Scholar
Tsalera, E., Papadakis, A., Samarakou, M.: Comparison of pre-trained cnns for audio classification using transfer learning. J. Sens. Actuator Netw. (2021). https://doi.org/10.3390/jsan10040072
Article Google Scholar
Peng, X., Xu, H., Liu, J., Wang, J., He, C.: Multi-class voice disorder classification using OpenL3-SVM (2022). https://ssrn.com/abstract=4047840
Syed, Z.S., Memon, S.A., Memon, A.L.: Deep acoustic embeddings for identifying Parkinsonian speech. Int. J. Adv. Comput. Sci. Appl. 11(10), 726–734 (2020)
Google Scholar
Ding, Y., Lerch, A.: Audio embeddings as teachers for music classification (2023). http://arxiv.org/abs/2306.17424
Sahoo, S., Dandapat, S.: Detection of speech-based physical load using transfer learning approach. IEEE INDICON (2021). https://doi.org/10.1109/INDICON52576.2021.9691530
Article Google Scholar
Shi, L., Du, K., Zhang, C., Ma, H., Yan, W.: Lung sound recognition algorithm based on VGGish-BiGRU. IEEE Access 7, 139438–139449 (2019). https://doi.org/10.1109/ACCESS.2019.2943492
Article Google Scholar
CV, S., Rao, P., Velmurugan, R.: Classroom activity detection in noisy preschool environments with audio analysis
Jiechieu, F., Tsopze, N.: Une approche basée sur la méthode LRP pour l’explication des Réseaux de Neurones Convolutifs appliqués à la classification des textes (2022). https://hal.archives-ouvertes.fr/hal-03701361
Thibeau-Sutre, E., Collin, S., Burgos, N., Colliot, O.: Interpretability of machine learning methods applied to neuroimaging (2022). http://arxiv.org/abs/2204.07005
Li, X., et al.: Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond (2021). http://arxiv.org/abs/2103.10689
Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey (2019). http://arxiv.org/abs/1911.12116
Henna, S., Alcaraz, J.M.L.: From interpretable filters to predictions of convolutional neural networks with explainable artificial intelligence (2022). http://arxiv.org/abs/2207.12958
Ilias, L., Askounis, D.: Explainable identification of dementia from transcripts using transformer networks (2021). https://doi.org/10.1109/JBHI.2022.3172479
Ellis and Chowdhry: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet. Github Tensorflow model
Hershey: https://github.com/tensorflow/models/tree/master/research/audioset/vggish. github tensorflow models
Hershey, S., et al.: CNN Architectures for large-scale audio classification. IEEE (2017)
Weck, B., Favory, X., Drossos, K., Serra, X.: Evaluating off-the-shelf machine listening and natural language models for automated audio captioning (2021). http://arxiv.org/abs/2110.07410
Cramer, J.: https://github.com/marl/openl3. Github marl
Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look listen and learn more: design choices for deep audio embeddings. IEEE, p. 7020 (2019)
What Is Mean And Standard Deviation In Image Processing. https://www.icsid.org/uncategorized/what-is-mean-and-standard-deviation-in-image-processing (2022)
Albouy, P., Benjamin, L., Morillon, B., Zatorre, R.J.: Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043 (2020)
Article ADS CAS PubMed Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Electrical and Electronics Engineering Department, Kahramanmaras Sutcu Imam University, Kahramanmaraş, 46100, Turkey
Fatma Özcan & Ahmet Alkan

Authors

Fatma Özcan
View author publications
You can also search for this author in PubMed Google Scholar
Ahmet Alkan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the development of the study. Material preparation, data collection and analysis were carried out by FÖ. The work was supervised by AA. The first draft of the manuscript was written by FÖ. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Fatma Özcan.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary information

Additional figures and tables can be found in the Supplementary Information file.

Other information

This work is based on the thesis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 11061 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Özcan, F., Alkan, A. Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus. SIViP 18, 1193–1204 (2024). https://doi.org/10.1007/s11760-023-02825-3

Download citation

Received: 21 June 2023
Revised: 01 September 2023
Accepted: 03 October 2023
Published: 31 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11760-023-02825-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

Abstract

Access this article

Similar content being viewed by others

Convolutional Neural Networks for Audio Classification: An Ensemble Approach

Environmental Sound Recognition Using Masked Conditional Neural Networks

Degramnet: effective audio analysis based on a fully learnable time–frequency representation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Supplementary information

Other information

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 11061 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

Abstract

Access this article

Similar content being viewed by others

Convolutional Neural Networks for Audio Classification: An Ensemble Approach

Environmental Sound Recognition Using Masked Conditional Neural Networks

Degramnet: effective audio analysis based on a fully learnable time–frequency representation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Supplementary information

Other information

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 11061 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation