Auditory-Based Feature Extraction and Robust Speaker Identification

  • Qi (Peter) LiEmail author
Part of the Signals and Communication Technology book series (SCT)


In the previous chapter, we introduced a robust auditory transform (AT). In this chapter, we present an auditory-based feature extraction algorithm based on the AT and apply it to robust speaker identification. Usually, the performances of acoustic models trained in clean speech drop significantly when tested in noisy speech. The presented features, however, have shown strong robustness in this kind of situation. We present a typical text-independent speaker identification system in the experiment section. Under all three different mismatched testing conditions, with white noise, car noise, or babble noise, the auditory features consistently perform better than the baseline mel frequency cepstral coefficient (FMCC) features. The auditory features are also compared with perceptual linear predictive (PLP) and RASTA-PLP features, The features consistently perform much better than PLP. Under white noise, the FMCC features are much better than RASTA-PLP. Under car and babble noises, the performace are similar.


Fast Fourier Transform Hair Cell Discrete Cosine Transform Basilar Membrane Speaker Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
    Atal, B. S.: “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of the Acoustical Society of America 55, 1304–1312 (1974)CrossRefGoogle Scholar
  4. 4.
    Davis, S. B., Mermelstein P.: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-28, 357–366 (1980)CrossRefGoogle Scholar
  5. 5.
    Grimaldi, M., Cummins, F.: “Speaker identification using instantaneous frequencies”. IEEE Trans. on Audio, Speech, and Language Processing 16, 1097–1111 (2008)CrossRefGoogle Scholar
  6. 6.
    Hermansky, H.: “Perceptual linear predictive (PLP) analysis of speech”. J. Acoust. Soc. Am. 87, 1738–1752 (1990)CrossRefGoogle Scholar
  7. 7.
    Hermansky, H., Morgan, N.: “Rasta processing of speech”. IEEE Trans. Speech and Audio Proc. 2, 578–589 (1994)CrossRefGoogle Scholar
  8. 8.
    Li, Q.: “An auditory-based transform for audio signal processing,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY), Oct. 2009Google Scholar
  9. 9.
    Li, Q.: “Solution for pervasive speaker recognition,” SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ, June 2003Google Scholar
  10. 10.
    Li, Q., Huang, Y.; “An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions,” IEEE Trans. on Audio, Speech and Language Processing, Sept. 2011Google Scholar
  11. 11.
    Li, Q., Huang, Y.: “Robust speaker identification using an auditory-based feature,” in ICASSP 2010 (2010)Google Scholar
  12. 12.
    Li, Q., Soong, F. K., Olivier, S.: “An auditory system-based feature for robust speech recognition,” in Proc. 7th European Conf. on Speech Communication and Technology (Denmark), pp. 619–622, Sept. 2001Google Scholar
  13. 13.
    Li, Q., Soong, F. K., Siohan, O.: “A high-performance auditory feature for robust speech recognition,” in Proceedings of 6th Int’l Conf. on Spoken Language Processing (Beijing), pp. III 51–54, Oct. 2000Google Scholar
  14. 14.
    Makhoul, J.: “Linear prediction: a tutorial review”. Proceedings of the IEEE 63, 561–580 (1975)CrossRefGoogle Scholar
  15. 15.
    Moore, B. C. J., Glasberg, B. R.: “Suggested formula for calculating auditory-filter bandwidth and excitation patterns,” J. Acoust. Soc. Am. 4, 750–753 (1983)CrossRefGoogle Scholar
  16. 16.
    Moore, B. C.: An introduction to the psychology of hearing. Academic Press, NY (1997)Google Scholar
  17. 17.
    Reynolds, D., , Rose, R. C.: “Robust text-independent speaker identification using Gaussian mixture speaker models”. IEEE Trans. on Speech and Audio Processing 3, 72–83 (1995)CrossRefGoogle Scholar
  18. 18.
    Shao, Y., Wang, D.: “Robust speaker identification using auditory features and computational auditory scene analysis,” in Proceedings of IEEE ICASSP, pp. 1589–1592, 2008Google Scholar
  19. 19.
    Stevens, S. S.: “On the psychophysical law”. Psychol. Rev. 64, 153–181 (1957)CrossRefGoogle Scholar
  20. 20.
    Stevens, S. S.: “Perceived level of noise by Mark VII and decibels (E)”. J. Acoustic. Soc. Am. 51, 575–601 (1972)CrossRefGoogle Scholar
  21. 21.
    Zwicker, E., Terhardt, E.: “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency”. J. Acoust. Soc. Am. 68, 1523–1525 (1980)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg  2012

Authors and Affiliations

  1. 1.Li Creative Technologies (LcT), IncFlorham ParkUSA

Personalised recommendations