Isolated Word Recognition Using Enhanced MFCC and IIFs

  • S. D. Umarani
  • R. S. D. Wahidabanu
  • P. Raviram
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 199)


The main objective of this paper is to design a noise-resilient and speaker independent speech recognition system for isolated word recognition. Mel-frequency Cepstral Coefficients (MFCCs) has been used for feature extraction. Noise robust performance of MFCC under mismatched training and testing conditions is enhanced by the application of wavelet based denoising algorithm and also to make MFCCs as robust to variation in vocal track length (VTL) an invariant-integration method is applied. The resultant features are called as enhanced MFCC Invariant-Integration Features (EMFCCIIFs). To accomplish the objective of this paper, classifier called feature-finding neural network (FFNN) is used for the recognition of isolated words. Results are compared with the results obtained by the traditional MFCC features. Through experiments it is observed that under mismatched conditions, the EMFCCIIFs features remains high recognition rate under low Signal-to-noise ratios (SNRs) and their performance are more effective under high SNRs too.


Isolated word Denoising Invariant-integration MFCC IIF FFNN SNR 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Juang, B.H., Rabiner, L.R.: Automatic Speech Recognition—A Brief History of the Technology, 2nd edn. Elsevier Encyclopedia of Language and Linguistics (2005)Google Scholar
  2. 2.
    Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech, Audio Processing 2(4), 578–589 (1994)CrossRefGoogle Scholar
  3. 3.
    Acero, A., Stern, R.M.: Environmental robustness in automatic speech recognition. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1990), vol. 2, pp. 849–852 (1990)Google Scholar
  4. 4.
    Aldibbiat, N.M.: Optical wireless communication systems employing Dual Header Pulse Interval Modulation (DH-PIM). Sheffield Hallam University (2001)Google Scholar
  5. 5.
    Acero, A.: Acoustical and Environmental Robustness in Automatic Speech Recognition. Ph.D Thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania (1990)Google Scholar
  6. 6.
    Muller, F., Mertins, A.: Contextual invariant-integration features for improved speaker-independent speech recognition. Speech Communication 53(6), 830–841 (2011)CrossRefGoogle Scholar
  7. 7.
    Li, Q.: Solution for pervasive speaker recognition. SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ (2003)Google Scholar
  8. 8.
    Zhang, X., Meng, W.: The Research of Noise-Robust Speech Recognition Based on Frequency Warping Wavelet. In: Grimm, M., Kroschel, K. (eds.) Source: Robust Speech Recognition and Understanding, p. 460. I-Tech, Vienna (2007) ISBN 987-3-90213-08-0Google Scholar
  9. 9.
    Burkhardt, H., Muller, X.: On invariant sets of a certain class of fast translation invariant transforms. IEEE Transactions on Acoustic, Speech, and Signal Processing 28(5), 517–523 (1980)CrossRefMATHMathSciNetGoogle Scholar
  10. 10.
    Muller, F., Belilovsky, E., Mertins, A.: Generalized cyclic transformations in speaker independent speech recognition. In: IEEE Automatic Speech Recognition and Understanding Workshop, Merano, Italy, pp. 211–215 (2009)Google Scholar
  11. 11.
    Muller, F., Mertins, A.: Invariant-integration method for robust feature extraction in speaker-independent speech recognition. In: Int. Conf. Spoken Language Processing (Interspeech 2009-ICSLP), Brighton (2009)Google Scholar
  12. 12.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech and Audio Processing 26, 357–366 (1980)CrossRefGoogle Scholar
  13. 13.
    Muda, L., Begam, M., Elamvazuthi, I.: Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. Journal of Computing 2(3), 138–143 (2010)Google Scholar
  14. 14.
    Zhang, J., Li, G.-L., Zheng, Y.-Z., Liu, X.-Y.: A Novel Noise-robust Speech Recognition System Based on Adaptively Enhanced Bark Wavelet MFCC. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009), pp. 443–447 (2009)Google Scholar
  15. 15.
    Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine 13, 58–70 (1996)CrossRefGoogle Scholar
  16. 16.
    Gramss, T., Strube, H.W.: Recognition of isolated words based on psychoacoustics and neurobiology. Speech Commun. 9, 35–40 (1990)CrossRefGoogle Scholar
  17. 17.
    Gramss, T.: Word recognition with the Feature Finding Neural Network (FFNN). In: IEEE-SP Workshop Neural Networks for Signal Processing, Princeton, New Jersey (1991)Google Scholar
  18. 18.
    Cooke, M., Lee, T.-W.: Speech separation challenge,
  19. 19.
    Kohonen, T.: Self-Organization and Associative Memory, 2nd edn. Springer Series in Information Sciences, vol. 8, ch. 5, 7 (1988)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • S. D. Umarani
    • 1
  • R. S. D. Wahidabanu
    • 1
  • P. Raviram
    • 2
  1. 1.Government College of EngineeringSalemIndia
  2. 2.Department of CSEMahendra Engineering CollegeTiruchengodeIndia

Personalised recommendations