Skip to main content

Auditory-Based Time Frequency Transform

  • Chapter
  • First Online:
Speaker Authentication

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Time-frequency transforms play an important role in signal processing. Many speech processing algorithms needs to convert the time domain speech signal to a frequency domain. The Fourier transform (FT) and the fast Fourier transform (FFT) have been used for decades, but they are not robust to background noise. As shown in this chapter, FFT generates computation noise and pitch harmonics during its computation. In a different approach, the traveling wave in the cochlea was modeled as a Gammatone function. A bank of the functions has been used as the forward transform to decompose the input signal into different frequency bands, but there is no proven inverse transform for the Gammatone filter bank, and the filter bandwidths are fixed and cannot be adjusted for different kinds of applications. To address the above issues, the author presents a robust, invertible, and auditory-based time-frequency transform named auditory-based transform or auditory transform (AT) in [23, 22]. In this chapter, we provide a detailed introduction of the AT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen J.: “Cochlear modeling”. IEEE ASSP Magazine, pp. 3–29, Jan. 1985

    Google Scholar 

  2. Barbour, D. L., Wang, X.: “Contrast tuning in auditory cortex”. Science 299, 1073–1075 (2003)

    Article  Google Scholar 

  3. Bruce, I., Sacs, M., Young, E.: “An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses”. J. Acoust. Soc. Am 113, 369–388 (2003)

    Article  Google Scholar 

  4. Choueiter, G. F., Glass, J. R.: “An implementation of rational wavelets and filter design for phonetic classification”. IEEE Trans. on Audio, Speech, and Language Processing 15, 939–948 (2007)

    Article  Google Scholar 

  5. Daubechies I., Maes S. (1996) “A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models”. In: A. Aldroubi, M. Unser (eds.) Wavelets in Medicine and Biology (CRC Press), pp. 527–546

    Google Scholar 

  6. Davis, S.B., Mermelstein, P.: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Trans. on Acoustics, speech, and signal processing ASSP-28, 357–366 (1980)

    Article  Google Scholar 

  7. Evans E. F. (1977) “Frequency selectivity at high signal levels of single units in cochlear nerve and cochlear nucleus”. In: E. F. Evans, J. P. Wilson (eds.) Psychophysics and Physiology of Hearing. London, UK: Academic Press, pp. 195–192

    Google Scholar 

  8. Flanagan, J. L.: Speech analysis synthesis and perception. Springer-Verlag, New York (1972)

    Google Scholar 

  9. Fletcher H.: Speech and hearing in communication. Acoustical Society of America, 1995

    Google Scholar 

  10. Furui, S.: “Cepstral analysis techniques for automatic speaker verification”. IEEE Trans. Acoust., Speech, Signal Processing 27, 254–277 (1981)

    Article  Google Scholar 

  11. Gelfand, S. A.: Hearing, an introduction to psychological and physiological acoustics. 3rd edition. Marcel Dekker, New York (1998)

    Google Scholar 

  12. Ghitza, O.: “Auditory models and human performance in tasks related to speech coding and speech recognition”. IEEE Trans. on Speech and Audio Processing 2, 115–132 (1994)

    Article  Google Scholar 

  13. Goldstein, J. L.: “Modeling rapid waveform compression on the basilar membrane as a multiple-bandpass-nonlinear filtering”. Hearing Res. 49, 39–60 (1990)

    Article  Google Scholar 

  14. Hermansky, H., Morgan, N.: “Rasta processing of speech”. IEEE Trans. Speech and Audio Proc. 2, 578–589 (1994)

    Article  Google Scholar 

  15. Hohmann, V.: “Frequency analysis and synthesis using a Gammatone filterbank”. Acta Acoustica United with Acustica 88, 433–442 (2002)

    Google Scholar 

  16. Johannesma, P. I. M.: “The pre-response stimulus ensemble of neurons in the cochlear nucleus”. The proceeding of the symposium on hearing Theory IPO, 58–69 (1972)

    Google Scholar 

  17. Johnson, R. A., Wichern, D. W.: Applied Multivariate Statistical Analysis. 3rd edn. Prentice Hall, New Jersey (1988)

    MATH  Google Scholar 

  18. Kates, J. M.: “Accurate tuning curves in cochlea model”. IEEE Trans. on Speech and Audio Processing 1, 453–462 (1993)

    Article  Google Scholar 

  19. Kates, J. M.: “A time-domain digital cochlea model”. IEEE Trans. on Signal Processing 39, 2573–2592 (1991)

    Article  Google Scholar 

  20. Khanna S. M., Leonard D. G. B.: “Basilar membrane tuning in the cat cochlea”. Science 215:305–306, Jan 182

    Google Scholar 

  21. Kiang, N. Y.-S.: Discharge patterns of single fibers in the cat’s auditory nerve. 3rd edn. MIT, MA (1965)

    Google Scholar 

  22. Li Q.: “An auditory-based transform for audio signal processing”. in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY), Oct. 2009

    Google Scholar 

  23. Li Q.: “Solution for pervasive speaker recognition”. SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ, June (2003)

    Google Scholar 

  24. Li Q., Huang Y.: “An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions” IEEE Trans. on Audio, Speech and Language Processing, Sept. 2011

    Google Scholar 

  25. Li Q., Huang Y.: “Robust speaker identification using an auditory-based feature”. in ICASSP 2010 (2010)

    Google Scholar 

  26. Li Q., Soong F. K., Olivier S.: “An auditory system-based feature for robust speech recognition”. in Proc. 7th European Conf. on Speech Communication and Technology (Denmark), pp. 619–622, Sept. (2001)

    Google Scholar 

  27. Li Q., Soong F. K., Siohan O.: “A high-performance auditory feature for robust speech recognition”. in Proceedings of 6th Int’l Conf. on Spoken Language Processing (Beijing), pp. III 51–54, Oct. 2000

    Google Scholar 

  28. Li, Q., Zheng, J., Tsai, A., Zhou, Q.: “Robust endpoint detection and energy normalization for real-time speech and speaker recognition”. IEEE Trans. on Speech and Audio Processing 10, 146–157 (2002)

    Article  Google Scholar 

  29. Lin, J., Ki, W.-H., Edwards, T., Shamma, S.: “Analog VLSI implementations of auditory wavelet transforms using switched-capacitor circuits”. IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications 41, 572–583 (1994)

    Article  Google Scholar 

  30. Liu, W., Andreou, A. G., , M. H. Goldstein J.: “Voiced-speech representation by an analog silicon model of the auditory periphery”. IEEE Trans. on Neural Networks 3, 477–487 (1992)

    Article  Google Scholar 

  31. Lyon, R. F., Mead, C.: “An analog electronic cochlea”. IEEE Trans. on Acoustics, Speech, and Signal processing 36, 1119–1134 (1988)

    Article  MATH  Google Scholar 

  32. Max, B., Tam, Y.-C., Li, Q.: “Discriminative auditory features for robust speech recognition”. IEEE Trans. on Speech and Audio Processing 12, 27–36 (2004)

    Article  Google Scholar 

  33. Misiti, M., Misiti, Y., Oppenheim, G., Poggi, J.-M.: Wavelet Toolbox User’s Guide. 3rd edn. MathWorks, MA (2006)

    Google Scholar 

  34. Møller, , , A. R.: “Frequency selectivity of single auditory-nerve fibers in response to broadband noise stimuli”. J. Acoust. Soc. Am. 62, 135–142 (1977)

    Article  Google Scholar 

  35. Moore, B., Peters, R. W., Glasberg, B. R.: “Auditory filter shapes at low center frequencies”. J. Acoust. Soc. Am 88, 132–148 (1990)

    Article  Google Scholar 

  36. Moore, B. C. J., Glasberg, B. R.: “Suggested formula for calculating auditory-filter bandwidth and excitation patterns”. J. Acoust. Soc. Am. 74, 750–753 (1983)

    Article  Google Scholar 

  37. Moore, B. C.: An introduction to the psychology of hearing.. 3rd edn. Academic Press, NY (1997)

    Google Scholar 

  38. Nedzelnitsky, V.: “Sound pressures in the casal turn of the cat cochlea”. J. Acoustics Soc. Am. 68, 1676–1680 (1980)

    Article  Google Scholar 

  39. Patterson, R. D.: “Auditory filter shapes derived with noise stimuli”. J. Acoust. Soc. Am. 59, 640–654 (1976)

    Article  Google Scholar 

  40. Pickles, J. O.: An introduction to the physiology of hearing. 2nd edn. Academic Press, New York (1988)

    Google Scholar 

  41. Rao, R., Bopardikar, A.: Wavelet Transforms. 2nd edn. Adison-Wesley, MA (1998)

    MATH  Google Scholar 

  42. Sellami, L., Newcomb, R. W.: “A digital scattering model of the cochlea”. IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications 44, 174–180 (1997)

    Article  Google Scholar 

  43. Sellick, P. M., Patuzzi, R., Johnstone, B. M.: “Measurement of basilar membrane motion in the guinea pig using the Mossbauer technique”. J. Acoust. Soc. Am. 72, 131–141 (1982)

    Article  Google Scholar 

  44. Shaw E. A. G. The external ear, in Handbook of Sensory Physiology. New York: Springer-Verlag, 1974. W. D. Keidel and W. D. Neff eds

    Google Scholar 

  45. Teich M. C., Heneghan C., Khanna S. M. “Analysis of cellular vibrations in the living cochlea using the continuous wavelet transform and the short-time Fourier transform”. in Time frequency and wavelets in biomedical signal processing, pp. 243–269, 1998. Edited by M. Akay.

    Google Scholar 

  46. Torrence, C., Compo, G. P.: “A practical guide to wavelet analysis”. Bulletin of the American Meteorological Society 79, 61–78 (1998)

    Article  Google Scholar 

  47. Volkmer, M.: “Theoretical analysis of a time-frequency-PCNN auditory cortex model”. Internal J. of Neural Systems 15, 339–347 (2005)

    Article  Google Scholar 

  48. von Békésy, G.: Experiments in hearing. 2nd dn. McGRAW-HILL, New York (1998)

    Google Scholar 

  49. Wang D., Brown G. J. Fundamentals of computational auditory scene analysis in Computational Auditory Scene Analysis Edited by D. Wang and G. J. Brown. NJ: IEEE Press, 2006.

    Google Scholar 

  50. Wang, K., Shamma, S. A.: “Spectral shape analysis in the central auditory system”. IEEE Trans. on Speech and Audio Processing 3, 382–395 (1995)

    Article  Google Scholar 

  51. Weintraub M. A theory and computational model of auditory monaural sound separation. PhD thesis, Standford University, CA, August 1985

    Google Scholar 

  52. Wilson, J. P., Johnstone, J.: “Basilar membrane and middle-ear vibration in guinea pig measured by capacitive probe”. J. Acoust. Soc. Am. 57, 705–723 (1975)

    Article  Google Scholar 

  53. Wilson J. P., Johnstone J. “Capacitive probe measures of basilar membrane vibrations in”. Hearing Theory, 1972

    Google Scholar 

  54. Yost, W.: Fundamentals of Hearing: An Introduction, 3rd Edition. 2nd edn. Academic Press, New York (1994)

    Google Scholar 

  55. Zhou B. “Auditory filter shapes at high frequencies”. J. Acoust. Soc. Am 98:1935–1942

    Google Scholar 

  56. Zilany, M., Bruce, I.: “Modeling auditory-nerve response for high sound pressure levels in the normal and impaired auditory periphery”. J. Acoust. Soc. Am 120, 1447–1466 (2006)

    Article  Google Scholar 

  57. Zweig, G., Lipes, R., Pierce, J. R.: “The cochlear compromise”. J. Acoust. Soc. Am. 59, 975–982 (1976)

    Article  Google Scholar 

  58. Zwicker, E., Terhardt, E.: “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency”. J. Acoust. Soc. Am. 68, 1523–1525 (1980)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi (Peter) Li .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, Q.(. (2012). Auditory-Based Time Frequency Transform. In: Speaker Authentication. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23731-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23731-7_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23730-0

  • Online ISBN: 978-3-642-23731-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics