Skip to main content
Log in

GMM-Based Speaker Verification System with Hardware MFCC in SoC Design

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, speaker verification has been extensively explored and has significantly improved its effectiveness. It analyzes the voiceprint characteristics of speakers and finds out the differences in voiceprint characteristics between speakers for verification. In this paper, we propose a text-dependent speaker verification system and its hardware implementation of the feature extraction. The proposed speaker verification system includes two phases: enrollment and verification. In the enrollment phase, the speaker has to provide appropriate speech, such as continuous number strings, sentences, or phrases for building the speakers’ models in the system. In the verification phase, the verified speech is substituted into the enrolled speaker models, and the similarity between the speech and the models is used to discriminate. We further design the whole system in a system-on-a-chip (SoC). We focus on the Mel-frequency cepstral coefficients (MFCCs) pre-processing module on FPGA and implement the lightweight the post-processing models such as Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) in software. A piece of speech data can be processed in 53.6ms to meet the real-time way. The proposed speaker verification system has a 93.3% accuracy rate. The overall architecture consumes only 4.26W on Xilinx ZCU104. Moreover, the proposed MFCC chip was implemented in TSMC 90nm, and the gate count is 276k at 1 volt while power consumption is 41.15 mW with a 200 MHz operating frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The datasets generated and/or analyzed during the present study are available from the corresponding author on reasonable request.

References

  1. Mahboob T, Khanam M, Khiyal M, Bibi R (2015) Speaker Identification Using GMM with MFCC. Intl J Comput Sci 12: 126-135

  2. Zeinali H, Sameti H, Burget L (2017) HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25:1421–1435

    Article  Google Scholar 

  3. Mao S, Tao D, Zhang G, Ching PC, Lee T (2019) Revisiting Hidden Markov Models for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6715-6719

  4. Chakroun R, Zouari LB, Frikha M, Hamida AB (2016) Improving text-independent speaker recognition with GMM, 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 693-696

  5. Ru Wang; Wentao Fan (2019) Positive Sequential Data Modeling Using Continuous Hidden Markov Models Based on Inverted Dirichlet Mixtures. IEEE Access

  6. Babu SP, Jayadas CK (2015) GMM Based Speaker Verification System, International Journal of Engineering Research and Technology (IJERT), vol: 04, pp. 1398-1401

  7. Povey D et al. (2011) The Kaldi Speech Recognition Toolkit, IEEE Workshop on Automatic Speech Recognition and Understanding

  8. Ramos-Lara R, López-García M, Cantó-Navarro E, Puente-Rodriguez L (May 2013) Real-time speaker verification system implemented on reconfigurable hardware. J Signal Process Syst 71(2):89–103

    Article  Google Scholar 

  9. Dalmiya CP, Dharun VS, Rajesh KP (2013) An efficient method for Tamil speech recognition using MFCC and DTW for mobile applications, IEEE Conference on Information & Communication Technologies

  10. von Zeddelmann D, Kurth F, Müller M (2010) Perceptual audio features for unsupervised key-phrase detection," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, 257-260

  11. Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams, in ASRU, 398–403

  12. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, pp. 72–83

  13. CRV 2010 A short tutorial on Gaussian Mixture Models. by: Mohand Saïd Allili Université du Québec en Outaouais

  14. Toruk MM, Gokay R (2019) Short Utterance Speaker Recognition Using Time-Delay Neural Network, 16th International Multi-Conference on Systems, Signals & Devices (SSD)

  15. Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018) X-Vectors: Robust DNN Embeddings for Speaker Recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 5329-5333, https://doi.org/10.1109/ICASSP.2018.8461375

  16. Chen C-P, Zhang S-Y, Yeh C-T, Wang J-C, Wang T, Huang C-L (2019) Speaker Characterization Using TDNN-LSTM Based Speaker Embedding, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 6211-6215. https://doi.org/10.1109/ICASSP.2019.8683185

  17. Chung JS, Nagrani A, Zisserman A (2018) VoxCeleb2: Deep Speaker Recognition. Proc. Interspeech 2018, 1086-1090, doi: 10.21437/Interspeech.2018-1929

  18. Sarkar A, Tan Zheng-Hua (2016). Text Dependent Speaker Verification Using un-supervised HMM-UBM and Temporal GMM-UBM. 10.21437/Interspeech.2016-362

  19. Furui S (1997) Recent advances in speaker recognition. Pattern Recognition Letters 18:859–872

    Article  Google Scholar 

  20. Thu DDT, Van LT, Hong QN, Ngoc HP (2013) Text-dependent speaker recognition for Vietnamese, 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, 196-200

  21. Babu SP, Jayadas CK (2015) GMM Based Speaker Verification System, International Journal of Engineering Research & Technology (IJERT) Volume 04, Issue 04

  22. Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology (ISSN 2320-6802). Volume 1

  23. Chauhan N, Isshiki T, Li D (2019) Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM Classifier for Large Input Database, IEEE 4th International Conference on Computer and Communication Systems (ICCCS)

  24. Chowdhury A, Ross A (2020) Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals, IEEE Trans Inform Forens Secur

  25. Chougala M, Kuntoji S (2016) Novel text independent speaker recognition using LPC based formants, 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, 510-513

  26. Sharma R, Bhukya R, Prasanna SRM (2017) Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification. Speech Communication. 96. https://doi.org/10.1016/j.specom.2017.12.001

  27. Islam Md, Galib ANS (2019) Bangla Dataset and MMFCC in Text-dependent Speaker Identification. 10.14456/easr.2019.7

  28. Das R, Prasanna S (2017) Speaker Verification from Short Utterance Perspective: A Review. IETE Technical Review. 1-19

  29. Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication Volume 60:56–77

    Article  Google Scholar 

  30. Muhammad A, Prihatmanto AS et al. (2018) Distance Measurements Method for The Demite Pronunciation Assessment, IEEE 8th International Conference on System Engineering and Technology (ICSET)

  31. Moon TK (1996) The expectation-maximization algorithm. in IEEE Signal Processing Magazine 13(6):47–60

    Article  Google Scholar 

  32. Khadkevich M, Omologo M (2013) Reassigned spectrum-based feature extraction for GMM-based automatic chord recognition. J EURASIP Journal on Audio, Speech, and Music Processing

  33. Manning CD, Schűtze H (2001) Foundations of Statistical Natural Language Processing, Fourth printing. The MIT Press

    Google Scholar 

  34. Rabiner LR (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of IEEE 77(2):257–285

    Article  Google Scholar 

  35. Lévy C, Linarès G, Bonastre J (2009) Compact Acoustic Models for Embedded Speech Recognition. J EURASIP Journal on Audio, Speech, and Music Processing 2009:806186

  36. Jo J, Yoo H, Park I-C (2016) Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24(2):754–758

    Article  Google Scholar 

  37. Vu N-V, Whittington J, Ye H, Devlin J (2010) Implementation of the MFCC front-end for low-cost speech recognition systems, in Proc ISCAS, pp. 2334–2337

  38. EhKan P, Allen T, Quigley SF (2011) FPGA implementation for GMM-based speaker identification. Int. J. Reconfig. Comput. 2011(3):1–8

    Article  Google Scholar 

  39. Dao V-L, Nguyen V-D, Nguyen H-D, Hoang V-P (Nov. 2017) Hardware Implementation of MFCC Feature Extraction for Speech Recognition on FPGA. International Conference on Advances in Information and Communication Technology:248–254

  40. Boujelben O, Bahoura M (2018) Efficient FPGA-based Architecture of an Automatic Wheeze Detector using a Combination of MFCC and SVM Algorithms. Journal of Systems Architecture 88:54–64

    Article  Google Scholar 

  41. Gadamsetty NKM, Kailath BJ (2019) FPGA Implementation of Speech Recognizer for Isolated Words. 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Rourkela, India, 25-28. https://doi.org/10.1109/iSES47678.2019.00019

  42. Anshu A, Raghuvanshi S, Muchahary D (2022) FPGA Design for Efficient Speech Processing System, 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), Ravet, India, 1-5. https://doi.org/10.1109/ASIANCON55314.2022.9908692

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-Han Tsai.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsai, TH., Wang, CL. GMM-Based Speaker Verification System with Hardware MFCC in SoC Design. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17561-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-17561-6

Keywords

Navigation