Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition

  • Kevin M. Indrebo
  • Richard J. Povinelli
  • Michael T. Johnson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3817)

Abstract

Novel speech features calculated from third-order statistics of subband-filtered speech signals are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear information not represented by cepstral coefficients. Also, because the features presented in this paper are based on the third-order moments, they may be more immune to Gaussian noise than cepstrals, as Gaussian distributions have zero third-order moments. Experiments on the AURORA2 database studying these features in combination with Mel-frequency cepstral coefficients (MFCC’s) are presented, and some improvement over the MFCC-only baseline is shown when clean speech is used for training, though the same improvement is not seen when multi-condition training data is used.

Keywords

Speech Recognition Speech Signal Speech Recognition System Clean Speech Word Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gold, B., Morgan, N.: Speech and Audio Signal Processing. John Wiley and Sons, New York (2000)Google Scholar
  2. 2.
    Banbrook, M., McLaughlin, S.: Is Speech Chaotic? Presented at IEE Colloquium on Exploiting Chaos in Signal Processing (1994)Google Scholar
  3. 3.
    Banbrook, M., McLaughlin, S., Mann, I.: Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing 7, 1–17 (1999)CrossRefGoogle Scholar
  4. 4.
    Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. Presented at NATO ASI on Speech Production and Speech Modelling (1990)Google Scholar
  5. 5.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis for speech recognition. Presented at Journal of the Acoustical Society of America (1990)Google Scholar
  6. 6.
    Gu, L., Rose, K.: Perceptual harmonic cepstral coefficients for speech recognition in noisy environments. In: Presented at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Salt Lake City, UT (2001)Google Scholar
  7. 7.
    Boll, S.F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27, 113–120 (1979)CrossRefGoogle Scholar
  8. 8.
    Yu, K., Xu, B., Dai, M., Yu, C.: Suppressing cocktail party noise for speech recognition. In: Presented at 5th International conference on signal processing (WCCC-ICSP 2000), Beijing, China (2000)Google Scholar
  9. 9.
    Deng, L., Acero, A., Plumpe, M., Huang, X.: Large-Vocabulary Speech Recognition Under Adverse Acoustic Environments. In: Presented at Internation Conference on Spoken Language Processing (ICSLP), Beijing, China (2000)Google Scholar
  10. 10.
    Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (1997)Google Scholar
  11. 11.
    Meyer, C., Rose, G.: Improved Noise Robustness By Corrective and Rival Training. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003 (2003)Google Scholar
  12. 12.
    Ott, E.: Chaos in dynamical systems. Cambridge University Press, Cambridge (1993)MATHGoogle Scholar
  13. 13.
    Pitsikalis, V., Maragos, P.: Speech analysis and feature extraction using chaotic models. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)Google Scholar
  14. 14.
    Liu, X., Povinelli, R.J., Johnson, M.T.: Vowel Classification by Global Dynamic Modeling. In: Presented at ISCA Tutorial and Research Workshop on Non-linear Speech Processing (NOLISP), Le Croisic, France (2003)Google Scholar
  15. 15.
    Dimitriadis, D., Maragos, P., Potamianos, A.: Modulation features for speech recognition. In: Presented at International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)Google Scholar
  16. 16.
    Johnson, M.T., Povinelli, R.J., Lindgren, A.C., Ye, J., Liu, X., Indrebo, K.M.: Time-Domain Isolated Phoneme Classification using Reconstructed Phase Spaces. IEEE Transactions on Speech and Audio Processing (in press)Google Scholar
  17. 17.
    Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Sub-banded Reconstructed Phase Spaces for Speech Recognition. Speech Communication (in press)Google Scholar
  18. 18.
    Pearce, D., Hirsch, H.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions, Beijing, China (2000)Google Scholar
  19. 19.
    HTK Version 2.1, Entropic Cambridge Research Laboratory Ltd. (1997) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kevin M. Indrebo
    • 1
  • Richard J. Povinelli
    • 1
  • Michael T. Johnson
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringMarquette UniversityMilwaukeeUSA

Personalised recommendations