Auditory-Based Time Frequency Transform

  • Qi (Peter) LiEmail author
Part of the Signals and Communication Technology book series (SCT)


Time-frequency transforms play an important role in signal processing. Many speech processing algorithms needs to convert the time domain speech signal to a frequency domain. The Fourier transform (FT) and the fast Fourier transform (FFT) have been used for decades, but they are not robust to background noise. As shown in this chapter, FFT generates computation noise and pitch harmonics during its computation. In a different approach, the traveling wave in the cochlea was modeled as a Gammatone function. A bank of the functions has been used as the forward transform to decompose the input signal into different frequency bands, but there is no proven inverse transform for the Gammatone filter bank, and the filter bandwidths are fixed and cannot be adjusted for different kinds of applications. To address the above issues, the author presents a robust, invertible, and auditory-based time-frequency transform named auditory-based transform or auditory transform (AT) in [23, 22]. In this chapter, we provide a detailed introduction of the AT.


Fast Fourier Transform Basilar Membrane Speaker Recognition Feature Extraction Algorithm Fast Fourier Transform Spectrum 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allen J.: “Cochlear modeling”. IEEE ASSP Magazine, pp. 3–29, Jan. 1985Google Scholar
  2. 2.
    Barbour, D. L., Wang, X.: “Contrast tuning in auditory cortex”. Science 299, 1073–1075 (2003)CrossRefGoogle Scholar
  3. 3.
    Bruce, I., Sacs, M., Young, E.: “An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses”. J. Acoust. Soc. Am 113, 369–388 (2003)CrossRefGoogle Scholar
  4. 4.
    Choueiter, G. F., Glass, J. R.: “An implementation of rational wavelets and filter design for phonetic classification”. IEEE Trans. on Audio, Speech, and Language Processing 15, 939–948 (2007)CrossRefGoogle Scholar
  5. 5.
    Daubechies I., Maes S. (1996) “A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models”. In: A. Aldroubi, M. Unser (eds.) Wavelets in Medicine and Biology (CRC Press), pp. 527–546Google Scholar
  6. 6.
    Davis, S.B., Mermelstein, P.: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Trans. on Acoustics, speech, and signal processing ASSP-28, 357–366 (1980)CrossRefGoogle Scholar
  7. 7.
    Evans E. F. (1977) “Frequency selectivity at high signal levels of single units in cochlear nerve and cochlear nucleus”. In: E. F. Evans, J. P. Wilson (eds.) Psychophysics and Physiology of Hearing. London, UK: Academic Press, pp. 195–192Google Scholar
  8. 8.
    Flanagan, J. L.: Speech analysis synthesis and perception. Springer-Verlag, New York (1972)Google Scholar
  9. 9.
    Fletcher H.: Speech and hearing in communication. Acoustical Society of America, 1995Google Scholar
  10. 10.
    Furui, S.: “Cepstral analysis techniques for automatic speaker verification”. IEEE Trans. Acoust., Speech, Signal Processing 27, 254–277 (1981)CrossRefGoogle Scholar
  11. 11.
    Gelfand, S. A.: Hearing, an introduction to psychological and physiological acoustics. 3rd edition. Marcel Dekker, New York (1998)Google Scholar
  12. 12.
    Ghitza, O.: “Auditory models and human performance in tasks related to speech coding and speech recognition”. IEEE Trans. on Speech and Audio Processing 2, 115–132 (1994)CrossRefGoogle Scholar
  13. 13.
    Goldstein, J. L.: “Modeling rapid waveform compression on the basilar membrane as a multiple-bandpass-nonlinear filtering”. Hearing Res. 49, 39–60 (1990)CrossRefGoogle Scholar
  14. 14.
    Hermansky, H., Morgan, N.: “Rasta processing of speech”. IEEE Trans. Speech and Audio Proc. 2, 578–589 (1994)CrossRefGoogle Scholar
  15. 15.
    Hohmann, V.: “Frequency analysis and synthesis using a Gammatone filterbank”. Acta Acoustica United with Acustica 88, 433–442 (2002)Google Scholar
  16. 16.
    Johannesma, P. I. M.: “The pre-response stimulus ensemble of neurons in the cochlear nucleus”. The proceeding of the symposium on hearing Theory IPO, 58–69 (1972)Google Scholar
  17. 17.
    Johnson, R. A., Wichern, D. W.: Applied Multivariate Statistical Analysis. 3rd edn. Prentice Hall, New Jersey (1988)zbMATHGoogle Scholar
  18. 18.
    Kates, J. M.: “Accurate tuning curves in cochlea model”. IEEE Trans. on Speech and Audio Processing 1, 453–462 (1993)CrossRefGoogle Scholar
  19. 19.
    Kates, J. M.: “A time-domain digital cochlea model”. IEEE Trans. on Signal Processing 39, 2573–2592 (1991)CrossRefGoogle Scholar
  20. 20.
    Khanna S. M., Leonard D. G. B.: “Basilar membrane tuning in the cat cochlea”. Science 215:305–306, Jan 182Google Scholar
  21. 21.
    Kiang, N. Y.-S.: Discharge patterns of single fibers in the cat’s auditory nerve. 3rd edn. MIT, MA (1965)Google Scholar
  22. 22.
    Li Q.: “An auditory-based transform for audio signal processing”. in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY), Oct. 2009Google Scholar
  23. 23.
    Li Q.: “Solution for pervasive speaker recognition”. SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ, June (2003)Google Scholar
  24. 24.
    Li Q., Huang Y.: “An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions” IEEE Trans. on Audio, Speech and Language Processing, Sept. 2011Google Scholar
  25. 25.
    Li Q., Huang Y.: “Robust speaker identification using an auditory-based feature”. in ICASSP 2010 (2010)Google Scholar
  26. 26.
    Li Q., Soong F. K., Olivier S.: “An auditory system-based feature for robust speech recognition”. in Proc. 7th European Conf. on Speech Communication and Technology (Denmark), pp. 619–622, Sept. (2001)Google Scholar
  27. 27.
    Li Q., Soong F. K., Siohan O.: “A high-performance auditory feature for robust speech recognition”. in Proceedings of 6th Int’l Conf. on Spoken Language Processing (Beijing), pp. III 51–54, Oct. 2000Google Scholar
  28. 28.
    Li, Q., Zheng, J., Tsai, A., Zhou, Q.: “Robust endpoint detection and energy normalization for real-time speech and speaker recognition”. IEEE Trans. on Speech and Audio Processing 10, 146–157 (2002)CrossRefGoogle Scholar
  29. 29.
    Lin, J., Ki, W.-H., Edwards, T., Shamma, S.: “Analog VLSI implementations of auditory wavelet transforms using switched-capacitor circuits”. IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications 41, 572–583 (1994)CrossRefGoogle Scholar
  30. 30.
    Liu, W., Andreou, A. G., , M. H. Goldstein J.: “Voiced-speech representation by an analog silicon model of the auditory periphery”. IEEE Trans. on Neural Networks 3, 477–487 (1992)CrossRefGoogle Scholar
  31. 31.
    Lyon, R. F., Mead, C.: “An analog electronic cochlea”. IEEE Trans. on Acoustics, Speech, and Signal processing 36, 1119–1134 (1988)zbMATHCrossRefGoogle Scholar
  32. 32.
    Max, B., Tam, Y.-C., Li, Q.: “Discriminative auditory features for robust speech recognition”. IEEE Trans. on Speech and Audio Processing 12, 27–36 (2004)CrossRefGoogle Scholar
  33. 33.
    Misiti, M., Misiti, Y., Oppenheim, G., Poggi, J.-M.: Wavelet Toolbox User’s Guide. 3rd edn. MathWorks, MA (2006)Google Scholar
  34. 34.
    Møller, , , A. R.: “Frequency selectivity of single auditory-nerve fibers in response to broadband noise stimuli”. J. Acoust. Soc. Am. 62, 135–142 (1977)CrossRefGoogle Scholar
  35. 35.
    Moore, B., Peters, R. W., Glasberg, B. R.: “Auditory filter shapes at low center frequencies”. J. Acoust. Soc. Am 88, 132–148 (1990)CrossRefGoogle Scholar
  36. 36.
    Moore, B. C. J., Glasberg, B. R.: “Suggested formula for calculating auditory-filter bandwidth and excitation patterns”. J. Acoust. Soc. Am. 74, 750–753 (1983)CrossRefGoogle Scholar
  37. 37.
    Moore, B. C.: An introduction to the psychology of hearing.. 3rd edn. Academic Press, NY (1997)Google Scholar
  38. 38.
    Nedzelnitsky, V.: “Sound pressures in the casal turn of the cat cochlea”. J. Acoustics Soc. Am. 68, 1676–1680 (1980)CrossRefGoogle Scholar
  39. 39.
    Patterson, R. D.: “Auditory filter shapes derived with noise stimuli”. J. Acoust. Soc. Am. 59, 640–654 (1976)CrossRefGoogle Scholar
  40. 40.
    Pickles, J. O.: An introduction to the physiology of hearing. 2nd edn. Academic Press, New York (1988)Google Scholar
  41. 41.
    Rao, R., Bopardikar, A.: Wavelet Transforms. 2nd edn. Adison-Wesley, MA (1998)zbMATHGoogle Scholar
  42. 42.
    Sellami, L., Newcomb, R. W.: “A digital scattering model of the cochlea”. IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications 44, 174–180 (1997)CrossRefGoogle Scholar
  43. 43.
    Sellick, P. M., Patuzzi, R., Johnstone, B. M.: “Measurement of basilar membrane motion in the guinea pig using the Mossbauer technique”. J. Acoust. Soc. Am. 72, 131–141 (1982)CrossRefGoogle Scholar
  44. 44.
    Shaw E. A. G. The external ear, in Handbook of Sensory Physiology. New York: Springer-Verlag, 1974. W. D. Keidel and W. D. Neff edsGoogle Scholar
  45. 45.
    Teich M. C., Heneghan C., Khanna S. M. “Analysis of cellular vibrations in the living cochlea using the continuous wavelet transform and the short-time Fourier transform”. in Time frequency and wavelets in biomedical signal processing, pp. 243–269, 1998. Edited by M. Akay.Google Scholar
  46. 46.
    Torrence, C., Compo, G. P.: “A practical guide to wavelet analysis”. Bulletin of the American Meteorological Society 79, 61–78 (1998)CrossRefGoogle Scholar
  47. 47.
    Volkmer, M.: “Theoretical analysis of a time-frequency-PCNN auditory cortex model”. Internal J. of Neural Systems 15, 339–347 (2005)CrossRefGoogle Scholar
  48. 48.
    von Békésy, G.: Experiments in hearing. 2nd dn. McGRAW-HILL, New York (1998)Google Scholar
  49. 49.
    Wang D., Brown G. J. Fundamentals of computational auditory scene analysis in Computational Auditory Scene Analysis Edited by D. Wang and G. J. Brown. NJ: IEEE Press, 2006.Google Scholar
  50. 50.
    Wang, K., Shamma, S. A.: “Spectral shape analysis in the central auditory system”. IEEE Trans. on Speech and Audio Processing 3, 382–395 (1995)CrossRefGoogle Scholar
  51. 51.
    Weintraub M. A theory and computational model of auditory monaural sound separation. PhD thesis, Standford University, CA, August 1985Google Scholar
  52. 52.
    Wilson, J. P., Johnstone, J.: “Basilar membrane and middle-ear vibration in guinea pig measured by capacitive probe”. J. Acoust. Soc. Am. 57, 705–723 (1975)CrossRefGoogle Scholar
  53. 53.
    Wilson J. P., Johnstone J. “Capacitive probe measures of basilar membrane vibrations in”. Hearing Theory, 1972Google Scholar
  54. 54.
    Yost, W.: Fundamentals of Hearing: An Introduction, 3rd Edition. 2nd edn. Academic Press, New York (1994)Google Scholar
  55. 55.
    Zhou B. “Auditory filter shapes at high frequencies”. J. Acoust. Soc. Am 98:1935–1942Google Scholar
  56. 56.
    Zilany, M., Bruce, I.: “Modeling auditory-nerve response for high sound pressure levels in the normal and impaired auditory periphery”. J. Acoust. Soc. Am 120, 1447–1466 (2006)CrossRefGoogle Scholar
  57. 57.
    Zweig, G., Lipes, R., Pierce, J. R.: “The cochlear compromise”. J. Acoust. Soc. Am. 59, 975–982 (1976)CrossRefGoogle Scholar
  58. 58.
    Zwicker, E., Terhardt, E.: “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency”. J. Acoust. Soc. Am. 68, 1523–1525 (1980)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg  2012

Authors and Affiliations

  1. 1.Li Creative Technologies (LcT), IncFlorham ParkUSA

Personalised recommendations