Scale-Space Theory for Auditory Signals

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9087)

Abstract

We show how the axiomatic structure of scale-space theory can be applied to the auditory domain and be used for deriving idealized models of auditory receptive fields via scale-space principles. For defining a time-frequency transformation of a purely temporal signal, it is shown that the scale-space framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal window functions. Applied to the definition of a second layer of receptive fields from the spectrogram, it is shown that the scale-space framework leads to two canonical families of spectro-temporal receptive fields, using a combination of Gaussian filters over the logspectral domain with either Gaussian filters or a cascade of first-order integrators over the temporal domain. These spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Such idealized models of auditory receptive fields respect auditory invariances, can be used for computing basic auditory features for audio processing and lead to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields in the inferior colliculus (ICC) and the primary auditory cortex (A1).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aertsen, A.M.H.J., Johannesma, P.I.M.: The spectro-temporal receptive field: A functional characterization of auditory neurons. Biol. Cyb. 42, 133–143 (1981)CrossRefMATHGoogle Scholar
  2. 2.
    Miller, L.M., Escabi, N.A., Read, H.L., Schreiner, C.: Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. J. Neurophys. 87, 516–527 (2001)Google Scholar
  3. 3.
    Gabor, D.: Theory of communication. J. of the IEE 93, 429–457 (1946)Google Scholar
  4. 4.
    Wolfe, P.J., Godsill, S.J., Dorfler, M.: Multi-Gabor dictionaries for audio time-frequency analysis. In: Appl. of Signal Proc. to Audio and Acoustics, pp. 43–46 (2001)Google Scholar
  5. 5.
    Johannesma, P.I.M.: The pre-response stimulus ensemble of neurons in the cochlear nucleus. In: IPO Symposium on Hearing Theory, Eindhoven, pp. 58–69 (1972)Google Scholar
  6. 6.
    Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. In: A meeting of the IOC Speech Group on Auditory Modelling at RSRE, vol. 2, 7 (1987)Google Scholar
  7. 7.
    Qiu, A., Schreiner, C.E., Escabi, M.A.: Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition. J. of Neurophysiology 90, 456–476 (2003)CrossRefGoogle Scholar
  8. 8.
    Elhilali, M., Fritz, J., Chi, T.S., Shamma, S.: Auditory cortical receptive fields: Stable entities with plastic abilities. J. of Neuroscience 27, 10372–10382 (2007)CrossRefGoogle Scholar
  9. 9.
    Atencio, C.A., Schreiner, C.E.: Spectrotemporal processing in spectral tuning modules of cat primary auditory cortex. PLOS ONE 7, e31537 (2012)CrossRefGoogle Scholar
  10. 10.
    Lindeberg, T., Friberg, A.: Idealized computational models of auditory receptive fields. PLOS ONE, 10.1371/journal.pone.0119032 (2015) preprint arXiv:1404.2037
  11. 11.
    Lindeberg, T.: Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space. J. of Mathematical Imaging and Vision 40, 36–81 (2011)CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Lindeberg, T., Fagerström, D.: Scale-space with causal time direction. In: Buxton, B., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1064, pp. 229–240. Springer, Heidelberg (1996)Google Scholar
  13. 13.
    Heckmann, M., Domont, X., Joublin, F., Goerick, C.: A hierarchical framework for spectro-temporal feature extraction. Speech Communication 53, 736–752 (2011)CrossRefGoogle Scholar
  14. 14.
    Ngamkham, W., Sawigun, C., Hiseni, S., Serdijn, W.A.: Analog complex gammatone filter for cochlear implant channels. In: ISCAS, pp. 969–972 (2010)Google Scholar
  15. 15.
    Koenderink, J.J.: Scale-time. Biological Cybernetics 58, 159–162 (1988)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Lindeberg, T.: Separable time-causal and time-recursive receptive fields. Scale Space and Variational Methods in Computer Vision (2015); these proceedingsGoogle Scholar
  17. 17.
    Lindeberg, T.: A computational theory of visual receptive fields. Biological Cybernetics 107, 589–635 (2013)CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    Lindeberg, T.: Feature detection with automatic scale selection. Int. J. of Computer Vision 30, 77–116 (1998)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computational Biology, School of Computer Science and CommunicationKTH Royal Institute of TechnologyStockholmSweden
  2. 2.Department of Speech, Music and Hearing, School of Computer Science and CommunicationKTH Royal Institute of TechnologyStockholmSweden

Personalised recommendations