Abstract
Recently, work has been done to understand aspects of how CI processes with sound. Here, we use neural temporal correlation in the inferior colliculus for identifying and categorising the sound that was used as a stimulus. The success of the classification gradually deteriorates for shorter durations. We tried to improve these success values with deep learning methods for audio, on processing windows of 62.5 ms, 250 ms and 1000 ms. We demonstrate that 62.5 ms could be an integration time for temporal correlation. The neural data contains sound features that can be easily processed with artificial neural networks dedicated to audio signals. Network architectures dedicated to audio classification, such as Yamnet, Vggish, Openl3, used in transfer learning, give quite quickly neural data classification results with very high accuracy, compared to image classification networks. In the case of unshuffled correlation images, we have the best accuracy. With noiseless shuffled correlation images, we have the best accuracy, such as for 1000 ms: 100%, for 250 ms: 96.7%, for 62.5 ms: 93.8%, obtained with the OpenL3 network. To evaluate the importance of the contributions of the input features of a neural network to its outputs, we use Explainable Artificial Intelligence. We then used three different explicability methods, such as Grad-CAM, LIME and Occlusion Sensitivity to obtain three sensitive maps. Network uses different regions corresponding to a very high or very low correlation to make its prediction.
Similar content being viewed by others
Data availability
The data supporting the results of this study are available in “Multi-site neural recordings in the auditory midbrain of unanesthetised rabbits listening to natural texture sounds and sound correlation auditory models” on CRCNS.org [15].
References
De Cheveigné, A.: Structure du Système Auditif (2004)
Driscoll, M.E., Tadi, P.: Neuroanatomy, Inferior Colliculus – StatPearls. NCBI Bookshelf (2021)
Downer, J.D., Niwa, M., Sutter, M.L.: Task engagement selectively modulates neural correlations in primary auditory cortex. J. Neurosci. 35(19), 7565–7574 (2015). https://doi.org/10.1523/JNEUROSCI.4094-14.2015
Sadeghi, M., Zhai, X., Stevenson, I.H., Escabí, M.A.: A neural ensemble correlation code for sound category identification. PLoS Biol. (2019). https://doi.org/10.1371/journal.pbio.3000449
Wiki: Colliculus Inférieur. https://stringfixer.com/fr/Brachium_of_the_inferior_colliculus (2022)
Schnupp, J., Nelken, I., King, A.J.: Auditory Neuroscience: Making Sense of Sound. The MIT Press (2011)
Heeringa, A.N., van Dijk, P.: Neural coding of the sound envelope is changed in the inferior colliculus immediately following acoustic trauma. Eur. J. Neurosci. 49(10), 1220–1232 (2019). https://doi.org/10.1111/ejn.14299
Zhai, X., et al.: Distinct neural ensemble response statistics are associated with recognition and discrimination of natural sound textures. Proc. Natl. Acad. Sci. USA (2020). https://doi.org/10.1073/pnas.2005644117/-/DCSupplemental
Shadlen, M.N., Newsome, W.T.: Neural basis of a perceptual decision in the parietal cortex (Area LIP) of the Rhesus Monkey. J. Neurophysiol. 86(4), 1916 (2001)
Özcan, F., Alkan, A.: Neural decoding of inferior colliculus multiunit activity for sound category identification with temporal correlation and deep learning. Biorxiv (2022). https://doi.org/10.1101/2022.08.24.505211
Livezey, J.A., Glaser, J.I.: Deep learning approaches for neural decoding: from CNNs to LSTMs and spikes to fMRI. http://arxiv.org/abs/2005.09687 (2020)
Ong, J.H., Goh, K.M., Lim, L.L.: Comparative analysis of explainable artificial intelligence for COVID-19 diagnosis on CXR image. IEEE ICSIPA (2021). https://doi.org/10.1109/ICSIPA52582.2021.9576766
Matlab: Deep Learning—Transfer Learning (2022)
Blackwell, J.M., Lesicko, A., Rao, W., De Biasi, M., Geffen, M.N.: Auditory cortex shapes sound responses in the inferior colliculus. Elife (2020). https://doi.org/10.7554/eLife.51890
Sadeghi, M., Zhai, X., Stevenson, I.H., Escabi, M.A.: Dataset: multi-site neural recordings in the auditory midbrain of unanesthetized rabbits listening natural texture sounds and sound correlation auditory models (2019)
Kell, A.J., McDermott, J.H.: Deep neural network models of sensory systems: windows onto the role of task constraints. Curr. Opin. Neurobiol. 55, 121–132 (2019). https://doi.org/10.1016/j.conb.2019.02.003
McKearney, R.M., MacKinnon, R.C.: Objective auditory brainstem response classification using machine learning. Int. J. Audiol. (2019). https://doi.org/10.1080/14992027.2018.1551633
Bing, D., et al.: Predicting the hearing outcome in sudden sensorineural hearing loss via machine learning models. Clin. Otolaryngol. 43(3), 868–874 (2018). https://doi.org/10.1111/coa.13068
Shigemoto, N., Stoh, H., Shibata, K., Inoue, Y.: Study of deep learning for sound scale decoding technology from human brain auditory cortex. In: 2019 IEEE 1st Global Conference on Life Sciences and Technologies, LifeTech 2019. Institute of Electrical and Electronics Engineers Inc., pp. 212–213 (2019). https://doi.org/10.1109/LifeTech.2019.8884004
Faisal, A., Nora, A., Seol, J., Renvall, H., Salmelin, R.: Kernel convolution model for decoding sounds from time-varying neural responses. PRNI (2015). https://doi.org/10.1109/PRNI.2015.10
Tsalera, E., Papadakis, A., Samarakou, M.: Comparison of pre-trained cnns for audio classification using transfer learning. J. Sens. Actuator Netw. (2021). https://doi.org/10.3390/jsan10040072
Peng, X., Xu, H., Liu, J., Wang, J., He, C.: Multi-class voice disorder classification using OpenL3-SVM (2022). https://ssrn.com/abstract=4047840
Syed, Z.S., Memon, S.A., Memon, A.L.: Deep acoustic embeddings for identifying Parkinsonian speech. Int. J. Adv. Comput. Sci. Appl. 11(10), 726–734 (2020)
Ding, Y., Lerch, A.: Audio embeddings as teachers for music classification (2023). http://arxiv.org/abs/2306.17424
Sahoo, S., Dandapat, S.: Detection of speech-based physical load using transfer learning approach. IEEE INDICON (2021). https://doi.org/10.1109/INDICON52576.2021.9691530
Shi, L., Du, K., Zhang, C., Ma, H., Yan, W.: Lung sound recognition algorithm based on VGGish-BiGRU. IEEE Access 7, 139438–139449 (2019). https://doi.org/10.1109/ACCESS.2019.2943492
CV, S., Rao, P., Velmurugan, R.: Classroom activity detection in noisy preschool environments with audio analysis
Jiechieu, F., Tsopze, N.: Une approche basée sur la méthode LRP pour l’explication des Réseaux de Neurones Convolutifs appliqués à la classification des textes (2022). https://hal.archives-ouvertes.fr/hal-03701361
Thibeau-Sutre, E., Collin, S., Burgos, N., Colliot, O.: Interpretability of machine learning methods applied to neuroimaging (2022). http://arxiv.org/abs/2204.07005
Li, X., et al.: Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond (2021). http://arxiv.org/abs/2103.10689
Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey (2019). http://arxiv.org/abs/1911.12116
Henna, S., Alcaraz, J.M.L.: From interpretable filters to predictions of convolutional neural networks with explainable artificial intelligence (2022). http://arxiv.org/abs/2207.12958
Ilias, L., Askounis, D.: Explainable identification of dementia from transcripts using transformer networks (2021). https://doi.org/10.1109/JBHI.2022.3172479
Ellis and Chowdhry: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet. Github Tensorflow model
Hershey: https://github.com/tensorflow/models/tree/master/research/audioset/vggish. github tensorflow models
Hershey, S., et al.: CNN Architectures for large-scale audio classification. IEEE (2017)
Weck, B., Favory, X., Drossos, K., Serra, X.: Evaluating off-the-shelf machine listening and natural language models for automated audio captioning (2021). http://arxiv.org/abs/2110.07410
Cramer, J.: https://github.com/marl/openl3. Github marl
Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look listen and learn more: design choices for deep audio embeddings. IEEE, p. 7020 (2019)
What Is Mean And Standard Deviation In Image Processing. https://www.icsid.org/uncategorized/what-is-mean-and-standard-deviation-in-image-processing (2022)
Albouy, P., Benjamin, L., Morillon, B., Zatorre, R.J.: Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043 (2020)
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the development of the study. Material preparation, data collection and analysis were carried out by FÖ. The work was supervised by AA. The first draft of the manuscript was written by FÖ. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Supplementary information
Additional figures and tables can be found in the Supplementary Information file.
Other information
This work is based on the thesis.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Özcan, F., Alkan, A. Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus. SIViP 18, 1193–1204 (2024). https://doi.org/10.1007/s11760-023-02825-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02825-3