Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition

  • P. Ehkan
  • F. F. Zakaria
  • M. N. M. Warip
  • Z. Sauli
  • M. Elshaikh
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 315)


The most important issues in the field of speech recognition and representative of the speech is a feature extraction. Feature extraction based Mel Frequency Cepstral Coefficient (MFCC) is one the most important features required among various kinds of speech application. In this paper, FPGA-based for speech features extraction MFCC algorithm is proposed. The complexities of computational as well as the requirement of memory usage are characterized, analyzed, and improved. Look-up table (LUT) scheme is used to deal with the elementary function value in the MFCC algorithm and fixed-point arithmetic is implemented to reduce the cost under accuracy study. The final feature extraction design is implemented effectively into the FPGA-Xilinx Virtex2 XC2V6000 FF1157-4 chip.


Speaker recognition Mel frequency cepstral coefficients Field programmable gate array 


  1. 1.
    Kung, S.Y., Mak, M.W., Lin, S.H.: Biometric Authentication: a Machine Learning Approach, 1st edn. Prentice Hall, New Jersey, USA (2005)Google Scholar
  2. 2.
    Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE. 85(9), 1437–1462 (1997)CrossRefGoogle Scholar
  3. 3.
    Sadaoki, F.: Fifty years of progress in speech and speaker recognition. J. Acoust. Soc. Am 116(4), 2497–2498 (2004)Google Scholar
  4. 4.
    Atal, B.S.: Automatic recognition of speakers from their voices. Proc. IEEE 64, 460–475 (1976)CrossRefGoogle Scholar
  5. 5.
    Furui, S.: An overview of speaker recognition technology, ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, 1–9 (1994)Google Scholar
  6. 6.
    Richard, D.P., Daryl, H.G.: An introduction to speech and speaker recognition. IEEE Comput. Soc. Press 23(8), 26–33 (1990)CrossRefGoogle Scholar
  7. 7.
    “System Generator for DSP” Xilinx Inc. (2006).
  8. 8.
    Rosenberg, A.E., Soong, F.K.: Recent research in automatic speaker recognition. In: Sadaoki, F. (ed.) Advances in Speech Signal Processing, 701–738 (1992)Google Scholar
  9. 9.
    Moretto, P.: Mapping of speech front-end signal processing to high performance vector architectures. Technical report, International Computer Science Institute (1995)Google Scholar
  10. 10.
    Premakanthan, P., and Mikhad, W. B.: Speaker verification/recognition and the importance of selective feature extraction: review. In: Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems, 1(1), 57–61 (2001)Google Scholar
  11. 11.
    Stolcke, A., Shriberg, E., Ferrer, L., Kajarekar, S., Sonmez, K., and Tur, G.: Speech recognition as feature extraction for speaker recognition. In: IEEE Workshop on Signal Processing Applications for Public Security and Forensics 1–5 (2007)Google Scholar
  12. 12.
    Atal, B.S., Hanauer, L.S.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50, 637–655 (1971)CrossRefGoogle Scholar
  13. 13.
    Davis, S. B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust Speech Sig. Process. 28(4), 357–366 (1980)Google Scholar
  14. 14.
    Hermansky, H.: Perceptual Linear Predictive Analysis of Speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  15. 15.
    Waleed, H.A.: Robust speaker modeling using perceptually motivated feature. Elsevier Sci. Pattern Recogn. Lett. 28(11), 1333–1342 (2007)CrossRefGoogle Scholar
  16. 16.
    Davis, S. B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)Google Scholar
  17. 17.
    John, H., Wendy, H.: Speech Synthesis and Recognition, 2nd edn. Taylor & Francis Inc, Bristol, USA (2002)Google Scholar
  18. 18.
    Milner, B.: A Comparison of Front-End Configurations for Robust Speech Recognition. Proceeding of ICASSP ’2002, 1(1), 797–800 (2002)Google Scholar
  19. 19.
    Schmidt, N.A., Thomas, H.C.: Speaker verification by human listeners: experiments comparing human and machine performance using the NIST1998 speaker evaluation data. J. Digit. Sig. Process. 10(1–3), 249–266 (2000)CrossRefGoogle Scholar
  20. 20.
    Chakroborty, S., Roy, A., Saha, G.: Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks. Int. J. Sig. Process. 4(2), 114–122 (2008)Google Scholar
  21. 21.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition, 2nd. ed. Pearson Education, USA (2003)Google Scholar
  22. 22.
    Ben, G., Nelson, M.: Speech and Audio Signal Processing, 2nd edn. Wiley, USA (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • P. Ehkan
    • 1
  • F. F. Zakaria
    • 1
  • M. N. M. Warip
    • 1
  • Z. Sauli
    • 2
  • M. Elshaikh
    • 1
  1. 1.School of Computer and Communication EngineeringUniversiti Malaysia PerlisArauMalaysia
  2. 2.School of Microelectronic EngineeringUniversiti Malaysia PerlisArauMalaysia

Personalised recommendations