Isolate Speech Recognition Based on Time-Frequency Analysis Methods

  • Alfredo Mantilla-Caeiros
  • Mariko Nakano Miyatake
  • Hector Perez-Meana
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5856)


A feature extraction method for isolate speech recognition is proposed, which is based on a time frequency analysis using a critical band concept similar to that performed in the inner ear model; which emulates the inner ear behavior by performing signal decomposition, similar to carried out by the basilar membrane. Evaluation results show that the proposed method performs better than other previously proposed feature extraction methods when it is used to characterize normal as well as esophageal speech signal.


Feature extraction inner ear model isolate speech recognition time-frequency analysis 


  1. 1.
    Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice Hall, Piscataway (1993)Google Scholar
  2. 2.
    Rabiner, R., Juang, B.H., Lee, C.H.: An Overview of Automatic Speech Recognition. In: Lee, C.H., Soong, F.K., Paliwal, K.K. (eds.) Automatic Speech and Speaker Recognition: Advanced Topics, pp. 1–30. Kluwer Academic Publisher, Dordrecht (1996)Google Scholar
  3. 3.
    Junqua, C., Haton, J.P.: Robustness in Automatic Speech Recognition. Kluwer Academic Publishers, Dordrecht (1996)Google Scholar
  4. 4.
    Pitton, J.W., Wang, K., Juang, B.H.: Time-frequency analysis and auditory modeling for automatic recognition od speech. Proc. of The IEEE 84(9), 1109–1215 (1999)Google Scholar
  5. 5.
    Haque, S., Togneri, R., Zaknich, A.: Perceptual features for automatic speech recognition in noise environments. Speech Communication 51(1), 58–75 (2009)CrossRefGoogle Scholar
  6. 6.
    Suarez-Guerra, S., Oropeza-Rodriguez, J.: Introduction to Speech Recognition. In: Perez-Meana, H. (ed.) Advances in Audio and Speech Signal Processing; Technologies and Applications, pp. 325–347. Idea Group Publishing, USA (2007)Google Scholar
  7. 7.
    Childers, D.G.: Speech Processing and Synthesis Toolboxes. Wiley and Sons, New York (2000)Google Scholar
  8. 8.
    Zhang, X., Heinz, M., Bruce, I., Carney, L.: A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. Acoustical Society of America 109(2), 648–670 (2001)CrossRefGoogle Scholar
  9. 9.
    Rao, R.M., Bopardikar, A.S.: Wavelets Transforms, Introduction to Theory and Applications. Addison Wesley, New York (1998)Google Scholar
  10. 10.
    Schroeder, M.R., et al.: Objective measure of certain speech signal degradations based on masking properties of the human auditory perception. In: Frontiers of Speech Communication Research. Academic Press, London (1979)Google Scholar
  11. 11.
    Freeman, J., et al.: Neural Networks, Algorithms, Applications and Programming Techniques. Addison-Wesley, New York (1991)zbMATHGoogle Scholar
  12. 12.
    Mantilla-Caeiros, A., Nakano-Miyatake, M., Perez-Meana, H.: A New Wavelet Function for Audio and Speech Processing. In: Proc. of the MWSCAS 2007, pp. 101–104 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alfredo Mantilla-Caeiros
    • 1
  • Mariko Nakano Miyatake
    • 2
  • Hector Perez-Meana
    • 2
  1. 1.Intituto Tecnologico de MonterreyMexico
  2. 2.ESIME CulhuacanInstituto Politécnico NacionalMexicoMexico

Personalised recommendations