Auditory-Based Time Frequency Transform

Li, Qi (Peter)

doi:10.1007/978-3-642-23731-7_7

Qi (Peter) Li²

Part of the book series: Signals and Communication Technology ((SCT))

776 Accesses
1 Citations

Abstract

Time-frequency transforms play an important role in signal processing. Many speech processing algorithms needs to convert the time domain speech signal to a frequency domain. The Fourier transform (FT) and the fast Fourier transform (FFT) have been used for decades, but they are not robust to background noise. As shown in this chapter, FFT generates computation noise and pitch harmonics during its computation. In a different approach, the traveling wave in the cochlea was modeled as a Gammatone function. A bank of the functions has been used as the forward transform to decompose the input signal into different frequency bands, but there is no proven inverse transform for the Gammatone filter bank, and the filter bandwidths are fixed and cannot be adjusted for different kinds of applications. To address the above issues, the author presents a robust, invertible, and auditory-based time-frequency transform named auditory-based transform or auditory transform (AT) in [23, 22]. In this chapter, we provide a detailed introduction of the AT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen J.: “Cochlear modeling”. IEEE ASSP Magazine, pp. 3–29, Jan. 1985
Google Scholar
Barbour, D. L., Wang, X.: “Contrast tuning in auditory cortex”. Science 299, 1073–1075 (2003)
Article Google Scholar
Bruce, I., Sacs, M., Young, E.: “An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses”. J. Acoust. Soc. Am 113, 369–388 (2003)
Article Google Scholar
Choueiter, G. F., Glass, J. R.: “An implementation of rational wavelets and filter design for phonetic classification”. IEEE Trans. on Audio, Speech, and Language Processing 15, 939–948 (2007)
Article Google Scholar
Daubechies I., Maes S. (1996) “A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models”. In: A. Aldroubi, M. Unser (eds.) Wavelets in Medicine and Biology (CRC Press), pp. 527–546
Google Scholar
Davis, S.B., Mermelstein, P.: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Trans. on Acoustics, speech, and signal processing ASSP-28, 357–366 (1980)
Article Google Scholar
Evans E. F. (1977) “Frequency selectivity at high signal levels of single units in cochlear nerve and cochlear nucleus”. In: E. F. Evans, J. P. Wilson (eds.) Psychophysics and Physiology of Hearing. London, UK: Academic Press, pp. 195–192
Google Scholar
Flanagan, J. L.: Speech analysis synthesis and perception. Springer-Verlag, New York (1972)
Google Scholar
Fletcher H.: Speech and hearing in communication. Acoustical Society of America, 1995
Google Scholar
Furui, S.: “Cepstral analysis techniques for automatic speaker verification”. IEEE Trans. Acoust., Speech, Signal Processing 27, 254–277 (1981)
Article Google Scholar
Gelfand, S. A.: Hearing, an introduction to psychological and physiological acoustics. 3rd edition. Marcel Dekker, New York (1998)
Google Scholar
Ghitza, O.: “Auditory models and human performance in tasks related to speech coding and speech recognition”. IEEE Trans. on Speech and Audio Processing 2, 115–132 (1994)
Article Google Scholar
Goldstein, J. L.: “Modeling rapid waveform compression on the basilar membrane as a multiple-bandpass-nonlinear filtering”. Hearing Res. 49, 39–60 (1990)
Article Google Scholar
Hermansky, H., Morgan, N.: “Rasta processing of speech”. IEEE Trans. Speech and Audio Proc. 2, 578–589 (1994)
Article Google Scholar
Hohmann, V.: “Frequency analysis and synthesis using a Gammatone filterbank”. Acta Acoustica United with Acustica 88, 433–442 (2002)
Google Scholar
Johannesma, P. I. M.: “The pre-response stimulus ensemble of neurons in the cochlear nucleus”. The proceeding of the symposium on hearing Theory IPO, 58–69 (1972)
Google Scholar
Johnson, R. A., Wichern, D. W.: Applied Multivariate Statistical Analysis. 3rd edn. Prentice Hall, New Jersey (1988)
MATH Google Scholar
Kates, J. M.: “Accurate tuning curves in cochlea model”. IEEE Trans. on Speech and Audio Processing 1, 453–462 (1993)
Article Google Scholar
Kates, J. M.: “A time-domain digital cochlea model”. IEEE Trans. on Signal Processing 39, 2573–2592 (1991)
Article Google Scholar
Khanna S. M., Leonard D. G. B.: “Basilar membrane tuning in the cat cochlea”. Science 215:305–306, Jan 182
Google Scholar
Kiang, N. Y.-S.: Discharge patterns of single fibers in the cat’s auditory nerve. 3rd edn. MIT, MA (1965)
Google Scholar
Li Q.: “An auditory-based transform for audio signal processing”. in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY), Oct. 2009
Google Scholar
Li Q.: “Solution for pervasive speaker recognition”. SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ, June (2003)
Google Scholar
Li Q., Huang Y.: “An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions” IEEE Trans. on Audio, Speech and Language Processing, Sept. 2011
Google Scholar
Li Q., Huang Y.: “Robust speaker identification using an auditory-based feature”. in ICASSP 2010 (2010)
Google Scholar
Li Q., Soong F. K., Olivier S.: “An auditory system-based feature for robust speech recognition”. in Proc. 7th European Conf. on Speech Communication and Technology (Denmark), pp. 619–622, Sept. (2001)
Google Scholar
Li Q., Soong F. K., Siohan O.: “A high-performance auditory feature for robust speech recognition”. in Proceedings of 6th Int’l Conf. on Spoken Language Processing (Beijing), pp. III 51–54, Oct. 2000
Google Scholar
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: “Robust endpoint detection and energy normalization for real-time speech and speaker recognition”. IEEE Trans. on Speech and Audio Processing 10, 146–157 (2002)
Article Google Scholar
Lin, J., Ki, W.-H., Edwards, T., Shamma, S.: “Analog VLSI implementations of auditory wavelet transforms using switched-capacitor circuits”. IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications 41, 572–583 (1994)
Article Google Scholar
Liu, W., Andreou, A. G., , M. H. Goldstein J.: “Voiced-speech representation by an analog silicon model of the auditory periphery”. IEEE Trans. on Neural Networks 3, 477–487 (1992)
Article Google Scholar
Lyon, R. F., Mead, C.: “An analog electronic cochlea”. IEEE Trans. on Acoustics, Speech, and Signal processing 36, 1119–1134 (1988)
Article MATH Google Scholar
Max, B., Tam, Y.-C., Li, Q.: “Discriminative auditory features for robust speech recognition”. IEEE Trans. on Speech and Audio Processing 12, 27–36 (2004)
Article Google Scholar
Misiti, M., Misiti, Y., Oppenheim, G., Poggi, J.-M.: Wavelet Toolbox User’s Guide. 3rd edn. MathWorks, MA (2006)
Google Scholar
Møller, , , A. R.: “Frequency selectivity of single auditory-nerve fibers in response to broadband noise stimuli”. J. Acoust. Soc. Am. 62, 135–142 (1977)
Article Google Scholar
Moore, B., Peters, R. W., Glasberg, B. R.: “Auditory filter shapes at low center frequencies”. J. Acoust. Soc. Am 88, 132–148 (1990)
Article Google Scholar
Moore, B. C. J., Glasberg, B. R.: “Suggested formula for calculating auditory-filter bandwidth and excitation patterns”. J. Acoust. Soc. Am. 74, 750–753 (1983)
Article Google Scholar
Moore, B. C.: An introduction to the psychology of hearing.. 3rd edn. Academic Press, NY (1997)
Google Scholar
Nedzelnitsky, V.: “Sound pressures in the casal turn of the cat cochlea”. J. Acoustics Soc. Am. 68, 1676–1680 (1980)
Article Google Scholar
Patterson, R. D.: “Auditory filter shapes derived with noise stimuli”. J. Acoust. Soc. Am. 59, 640–654 (1976)
Article Google Scholar
Pickles, J. O.: An introduction to the physiology of hearing. 2nd edn. Academic Press, New York (1988)
Google Scholar
Rao, R., Bopardikar, A.: Wavelet Transforms. 2nd edn. Adison-Wesley, MA (1998)
MATH Google Scholar
Sellami, L., Newcomb, R. W.: “A digital scattering model of the cochlea”. IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications 44, 174–180 (1997)
Article Google Scholar
Sellick, P. M., Patuzzi, R., Johnstone, B. M.: “Measurement of basilar membrane motion in the guinea pig using the Mossbauer technique”. J. Acoust. Soc. Am. 72, 131–141 (1982)
Article Google Scholar
Shaw E. A. G. The external ear, in Handbook of Sensory Physiology. New York: Springer-Verlag, 1974. W. D. Keidel and W. D. Neff eds
Google Scholar
Teich M. C., Heneghan C., Khanna S. M. “Analysis of cellular vibrations in the living cochlea using the continuous wavelet transform and the short-time Fourier transform”. in Time frequency and wavelets in biomedical signal processing, pp. 243–269, 1998. Edited by M. Akay.
Google Scholar
Torrence, C., Compo, G. P.: “A practical guide to wavelet analysis”. Bulletin of the American Meteorological Society 79, 61–78 (1998)
Article Google Scholar
Volkmer, M.: “Theoretical analysis of a time-frequency-PCNN auditory cortex model”. Internal J. of Neural Systems 15, 339–347 (2005)
Article Google Scholar
von Békésy, G.: Experiments in hearing. 2nd dn. McGRAW-HILL, New York (1998)
Google Scholar
Wang D., Brown G. J. Fundamentals of computational auditory scene analysis in Computational Auditory Scene Analysis Edited by D. Wang and G. J. Brown. NJ: IEEE Press, 2006.
Google Scholar
Wang, K., Shamma, S. A.: “Spectral shape analysis in the central auditory system”. IEEE Trans. on Speech and Audio Processing 3, 382–395 (1995)
Article Google Scholar
Weintraub M. A theory and computational model of auditory monaural sound separation. PhD thesis, Standford University, CA, August 1985
Google Scholar
Wilson, J. P., Johnstone, J.: “Basilar membrane and middle-ear vibration in guinea pig measured by capacitive probe”. J. Acoust. Soc. Am. 57, 705–723 (1975)
Article Google Scholar
Wilson J. P., Johnstone J. “Capacitive probe measures of basilar membrane vibrations in”. Hearing Theory, 1972
Google Scholar
Yost, W.: Fundamentals of Hearing: An Introduction, 3rd Edition. 2nd edn. Academic Press, New York (1994)
Google Scholar
Zhou B. “Auditory filter shapes at high frequencies”. J. Acoust. Soc. Am 98:1935–1942
Google Scholar
Zilany, M., Bruce, I.: “Modeling auditory-nerve response for high sound pressure levels in the normal and impaired auditory periphery”. J. Acoust. Soc. Am 120, 1447–1466 (2006)
Article Google Scholar
Zweig, G., Lipes, R., Pierce, J. R.: “The cochlear compromise”. J. Acoust. Soc. Am. 59, 975–982 (1976)
Article Google Scholar
Zwicker, E., Terhardt, E.: “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency”. J. Acoust. Soc. Am. 68, 1523–1525 (1980)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Li Creative Technologies (LcT), Inc, Vreeland Road 30 A, Suite 130, 07932, Florham Park, NJ, USA
Qi (Peter) Li

Authors

Qi (Peter) Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi (Peter) Li .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, Q.(. (2012). Auditory-Based Time Frequency Transform. In: Speaker Authentication. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23731-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23731-7_7
Published: 30 September 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23730-0
Online ISBN: 978-3-642-23731-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics