Skip to main content
Log in

Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Recently, work has been done to understand aspects of how CI processes with sound. Here, we use neural temporal correlation in the inferior colliculus for identifying and categorising the sound that was used as a stimulus. The success of the classification gradually deteriorates for shorter durations. We tried to improve these success values with deep learning methods for audio, on processing windows of 62.5 ms, 250 ms and 1000 ms. We demonstrate that 62.5 ms could be an integration time for temporal correlation. The neural data contains sound features that can be easily processed with artificial neural networks dedicated to audio signals. Network architectures dedicated to audio classification, such as Yamnet, Vggish, Openl3, used in transfer learning, give quite quickly neural data classification results with very high accuracy, compared to image classification networks. In the case of unshuffled correlation images, we have the best accuracy. With noiseless shuffled correlation images, we have the best accuracy, such as for 1000 ms: 100%, for 250 ms: 96.7%, for 62.5 ms: 93.8%, obtained with the OpenL3 network. To evaluate the importance of the contributions of the input features of a neural network to its outputs, we use Explainable Artificial Intelligence. We then used three different explicability methods, such as Grad-CAM, LIME and Occlusion Sensitivity to obtain three sensitive maps. Network uses different regions corresponding to a very high or very low correlation to make its prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The data supporting the results of this study are available in “Multi-site neural recordings in the auditory midbrain of unanesthetised rabbits listening to natural texture sounds and sound correlation auditory models” on CRCNS.org [15].

References

  1. De Cheveigné, A.: Structure du Système Auditif (2004)

  2. Driscoll, M.E., Tadi, P.: Neuroanatomy, Inferior Colliculus – StatPearls. NCBI Bookshelf (2021)

    Google Scholar 

  3. Downer, J.D., Niwa, M., Sutter, M.L.: Task engagement selectively modulates neural correlations in primary auditory cortex. J. Neurosci. 35(19), 7565–7574 (2015). https://doi.org/10.1523/JNEUROSCI.4094-14.2015

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sadeghi, M., Zhai, X., Stevenson, I.H., Escabí, M.A.: A neural ensemble correlation code for sound category identification. PLoS Biol. (2019). https://doi.org/10.1371/journal.pbio.3000449

    Article  PubMed  PubMed Central  Google Scholar 

  5. Wiki: Colliculus Inférieur. https://stringfixer.com/fr/Brachium_of_the_inferior_colliculus (2022)

  6. Schnupp, J., Nelken, I., King, A.J.: Auditory Neuroscience: Making Sense of Sound. The MIT Press (2011)

    Google Scholar 

  7. Heeringa, A.N., van Dijk, P.: Neural coding of the sound envelope is changed in the inferior colliculus immediately following acoustic trauma. Eur. J. Neurosci. 49(10), 1220–1232 (2019). https://doi.org/10.1111/ejn.14299

    Article  PubMed  Google Scholar 

  8. Zhai, X., et al.: Distinct neural ensemble response statistics are associated with recognition and discrimination of natural sound textures. Proc. Natl. Acad. Sci. USA (2020). https://doi.org/10.1073/pnas.2005644117/-/DCSupplemental

    Article  PubMed  PubMed Central  Google Scholar 

  9. Shadlen, M.N., Newsome, W.T.: Neural basis of a perceptual decision in the parietal cortex (Area LIP) of the Rhesus Monkey. J. Neurophysiol. 86(4), 1916 (2001)

    Article  CAS  PubMed  Google Scholar 

  10. Özcan, F., Alkan, A.: Neural decoding of inferior colliculus multiunit activity for sound category identification with temporal correlation and deep learning. Biorxiv (2022). https://doi.org/10.1101/2022.08.24.505211

    Article  Google Scholar 

  11. Livezey, J.A., Glaser, J.I.: Deep learning approaches for neural decoding: from CNNs to LSTMs and spikes to fMRI. http://arxiv.org/abs/2005.09687 (2020)

  12. Ong, J.H., Goh, K.M., Lim, L.L.: Comparative analysis of explainable artificial intelligence for COVID-19 diagnosis on CXR image. IEEE ICSIPA (2021). https://doi.org/10.1109/ICSIPA52582.2021.9576766

    Article  Google Scholar 

  13. Matlab: Deep Learning—Transfer Learning (2022)

  14. Blackwell, J.M., Lesicko, A., Rao, W., De Biasi, M., Geffen, M.N.: Auditory cortex shapes sound responses in the inferior colliculus. Elife (2020). https://doi.org/10.7554/eLife.51890

    Article  PubMed  PubMed Central  Google Scholar 

  15. Sadeghi, M., Zhai, X., Stevenson, I.H., Escabi, M.A.: Dataset: multi-site neural recordings in the auditory midbrain of unanesthetized rabbits listening natural texture sounds and sound correlation auditory models (2019)

  16. Kell, A.J., McDermott, J.H.: Deep neural network models of sensory systems: windows onto the role of task constraints. Curr. Opin. Neurobiol. 55, 121–132 (2019). https://doi.org/10.1016/j.conb.2019.02.003

    Article  CAS  PubMed  Google Scholar 

  17. McKearney, R.M., MacKinnon, R.C.: Objective auditory brainstem response classification using machine learning. Int. J. Audiol. (2019). https://doi.org/10.1080/14992027.2018.1551633

    Article  PubMed  Google Scholar 

  18. Bing, D., et al.: Predicting the hearing outcome in sudden sensorineural hearing loss via machine learning models. Clin. Otolaryngol. 43(3), 868–874 (2018). https://doi.org/10.1111/coa.13068

    Article  CAS  PubMed  Google Scholar 

  19. Shigemoto, N., Stoh, H., Shibata, K., Inoue, Y.: Study of deep learning for sound scale decoding technology from human brain auditory cortex. In: 2019 IEEE 1st Global Conference on Life Sciences and Technologies, LifeTech 2019. Institute of Electrical and Electronics Engineers Inc., pp. 212–213 (2019). https://doi.org/10.1109/LifeTech.2019.8884004

  20. Faisal, A., Nora, A., Seol, J., Renvall, H., Salmelin, R.: Kernel convolution model for decoding sounds from time-varying neural responses. PRNI (2015). https://doi.org/10.1109/PRNI.2015.10

    Article  Google Scholar 

  21. Tsalera, E., Papadakis, A., Samarakou, M.: Comparison of pre-trained cnns for audio classification using transfer learning. J. Sens. Actuator Netw. (2021). https://doi.org/10.3390/jsan10040072

    Article  Google Scholar 

  22. Peng, X., Xu, H., Liu, J., Wang, J., He, C.: Multi-class voice disorder classification using OpenL3-SVM (2022). https://ssrn.com/abstract=4047840

  23. Syed, Z.S., Memon, S.A., Memon, A.L.: Deep acoustic embeddings for identifying Parkinsonian speech. Int. J. Adv. Comput. Sci. Appl. 11(10), 726–734 (2020)

    Google Scholar 

  24. Ding, Y., Lerch, A.: Audio embeddings as teachers for music classification (2023). http://arxiv.org/abs/2306.17424

  25. Sahoo, S., Dandapat, S.: Detection of speech-based physical load using transfer learning approach. IEEE INDICON (2021). https://doi.org/10.1109/INDICON52576.2021.9691530

    Article  Google Scholar 

  26. Shi, L., Du, K., Zhang, C., Ma, H., Yan, W.: Lung sound recognition algorithm based on VGGish-BiGRU. IEEE Access 7, 139438–139449 (2019). https://doi.org/10.1109/ACCESS.2019.2943492

    Article  Google Scholar 

  27. CV, S., Rao, P., Velmurugan, R.: Classroom activity detection in noisy preschool environments with audio analysis

  28. Jiechieu, F., Tsopze, N.: Une approche basée sur la méthode LRP pour l’explication des Réseaux de Neurones Convolutifs appliqués à la classification des textes (2022). https://hal.archives-ouvertes.fr/hal-03701361

  29. Thibeau-Sutre, E., Collin, S., Burgos, N., Colliot, O.: Interpretability of machine learning methods applied to neuroimaging (2022). http://arxiv.org/abs/2204.07005

  30. Li, X., et al.: Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond (2021). http://arxiv.org/abs/2103.10689

  31. Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey (2019). http://arxiv.org/abs/1911.12116

  32. Henna, S., Alcaraz, J.M.L.: From interpretable filters to predictions of convolutional neural networks with explainable artificial intelligence (2022). http://arxiv.org/abs/2207.12958

  33. Ilias, L., Askounis, D.: Explainable identification of dementia from transcripts using transformer networks (2021). https://doi.org/10.1109/JBHI.2022.3172479

  34. Ellis and Chowdhry: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet. Github Tensorflow model

  35. Hershey: https://github.com/tensorflow/models/tree/master/research/audioset/vggish. github tensorflow models

  36. Hershey, S., et al.: CNN Architectures for large-scale audio classification. IEEE (2017)

  37. Weck, B., Favory, X., Drossos, K., Serra, X.: Evaluating off-the-shelf machine listening and natural language models for automated audio captioning (2021). http://arxiv.org/abs/2110.07410

  38. Cramer, J.: https://github.com/marl/openl3. Github marl

  39. Cramer, J., Wu, H.H., Salamon, J., Bello, J.P.: Look listen and learn more: design choices for deep audio embeddings. IEEE, p. 7020 (2019)

  40. What Is Mean And Standard Deviation In Image Processing. https://www.icsid.org/uncategorized/what-is-mean-and-standard-deviation-in-image-processing (2022)

  41. Albouy, P., Benjamin, L., Morillon, B., Zatorre, R.J.: Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043 (2020)

    Article  ADS  CAS  PubMed  Google Scholar 

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the development of the study. Material preparation, data collection and analysis were carried out by FÖ. The work was supervised by AA. The first draft of the manuscript was written by FÖ. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Fatma Özcan.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary information

Additional figures and tables can be found in the Supplementary Information file.

Other information

This work is based on the thesis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 11061 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özcan, F., Alkan, A. Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus. SIViP 18, 1193–1204 (2024). https://doi.org/10.1007/s11760-023-02825-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02825-3

Keywords

Navigation