Abstract
In recent years, speaker verification has been extensively explored and has significantly improved its effectiveness. It analyzes the voiceprint characteristics of speakers and finds out the differences in voiceprint characteristics between speakers for verification. In this paper, we propose a text-dependent speaker verification system and its hardware implementation of the feature extraction. The proposed speaker verification system includes two phases: enrollment and verification. In the enrollment phase, the speaker has to provide appropriate speech, such as continuous number strings, sentences, or phrases for building the speakers’ models in the system. In the verification phase, the verified speech is substituted into the enrolled speaker models, and the similarity between the speech and the models is used to discriminate. We further design the whole system in a system-on-a-chip (SoC). We focus on the Mel-frequency cepstral coefficients (MFCCs) pre-processing module on FPGA and implement the lightweight the post-processing models such as Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) in software. A piece of speech data can be processed in 53.6ms to meet the real-time way. The proposed speaker verification system has a 93.3% accuracy rate. The overall architecture consumes only 4.26W on Xilinx ZCU104. Moreover, the proposed MFCC chip was implemented in TSMC 90nm, and the gate count is 276k at 1 volt while power consumption is 41.15 mW with a 200 MHz operating frequency.
Similar content being viewed by others
Data availability
The datasets generated and/or analyzed during the present study are available from the corresponding author on reasonable request.
References
Mahboob T, Khanam M, Khiyal M, Bibi R (2015) Speaker Identification Using GMM with MFCC. Intl J Comput Sci 12: 126-135
Zeinali H, Sameti H, Burget L (2017) HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25:1421–1435
Mao S, Tao D, Zhang G, Ching PC, Lee T (2019) Revisiting Hidden Markov Models for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6715-6719
Chakroun R, Zouari LB, Frikha M, Hamida AB (2016) Improving text-independent speaker recognition with GMM, 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 693-696
Ru Wang; Wentao Fan (2019) Positive Sequential Data Modeling Using Continuous Hidden Markov Models Based on Inverted Dirichlet Mixtures. IEEE Access
Babu SP, Jayadas CK (2015) GMM Based Speaker Verification System, International Journal of Engineering Research and Technology (IJERT), vol: 04, pp. 1398-1401
Povey D et al. (2011) The Kaldi Speech Recognition Toolkit, IEEE Workshop on Automatic Speech Recognition and Understanding
Ramos-Lara R, López-García M, Cantó-Navarro E, Puente-Rodriguez L (May 2013) Real-time speaker verification system implemented on reconfigurable hardware. J Signal Process Syst 71(2):89–103
Dalmiya CP, Dharun VS, Rajesh KP (2013) An efficient method for Tamil speech recognition using MFCC and DTW for mobile applications, IEEE Conference on Information & Communication Technologies
von Zeddelmann D, Kurth F, Müller M (2010) Perceptual audio features for unsupervised key-phrase detection," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, 257-260
Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams, in ASRU, 398–403
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, pp. 72–83
CRV 2010 A short tutorial on Gaussian Mixture Models. by: Mohand Saïd Allili Université du Québec en Outaouais
Toruk MM, Gokay R (2019) Short Utterance Speaker Recognition Using Time-Delay Neural Network, 16th International Multi-Conference on Systems, Signals & Devices (SSD)
Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018) X-Vectors: Robust DNN Embeddings for Speaker Recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 5329-5333, https://doi.org/10.1109/ICASSP.2018.8461375
Chen C-P, Zhang S-Y, Yeh C-T, Wang J-C, Wang T, Huang C-L (2019) Speaker Characterization Using TDNN-LSTM Based Speaker Embedding, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 6211-6215. https://doi.org/10.1109/ICASSP.2019.8683185
Chung JS, Nagrani A, Zisserman A (2018) VoxCeleb2: Deep Speaker Recognition. Proc. Interspeech 2018, 1086-1090, doi: 10.21437/Interspeech.2018-1929
Sarkar A, Tan Zheng-Hua (2016). Text Dependent Speaker Verification Using un-supervised HMM-UBM and Temporal GMM-UBM. 10.21437/Interspeech.2016-362
Furui S (1997) Recent advances in speaker recognition. Pattern Recognition Letters 18:859–872
Thu DDT, Van LT, Hong QN, Ngoc HP (2013) Text-dependent speaker recognition for Vietnamese, 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, 196-200
Babu SP, Jayadas CK (2015) GMM Based Speaker Verification System, International Journal of Engineering Research & Technology (IJERT) Volume 04, Issue 04
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology (ISSN 2320-6802). Volume 1
Chauhan N, Isshiki T, Li D (2019) Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM Classifier for Large Input Database, IEEE 4th International Conference on Computer and Communication Systems (ICCCS)
Chowdhury A, Ross A (2020) Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals, IEEE Trans Inform Forens Secur
Chougala M, Kuntoji S (2016) Novel text independent speaker recognition using LPC based formants, 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, 510-513
Sharma R, Bhukya R, Prasanna SRM (2017) Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification. Speech Communication. 96. https://doi.org/10.1016/j.specom.2017.12.001
Islam Md, Galib ANS (2019) Bangla Dataset and MMFCC in Text-dependent Speaker Identification. 10.14456/easr.2019.7
Das R, Prasanna S (2017) Speaker Verification from Short Utterance Perspective: A Review. IETE Technical Review. 1-19
Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication Volume 60:56–77
Muhammad A, Prihatmanto AS et al. (2018) Distance Measurements Method for The Demite Pronunciation Assessment, IEEE 8th International Conference on System Engineering and Technology (ICSET)
Moon TK (1996) The expectation-maximization algorithm. in IEEE Signal Processing Magazine 13(6):47–60
Khadkevich M, Omologo M (2013) Reassigned spectrum-based feature extraction for GMM-based automatic chord recognition. J EURASIP Journal on Audio, Speech, and Music Processing
Manning CD, Schűtze H (2001) Foundations of Statistical Natural Language Processing, Fourth printing. The MIT Press
Rabiner LR (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of IEEE 77(2):257–285
Lévy C, Linarès G, Bonastre J (2009) Compact Acoustic Models for Embedded Speech Recognition. J EURASIP Journal on Audio, Speech, and Music Processing 2009:806186
Jo J, Yoo H, Park I-C (2016) Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24(2):754–758
Vu N-V, Whittington J, Ye H, Devlin J (2010) Implementation of the MFCC front-end for low-cost speech recognition systems, in Proc ISCAS, pp. 2334–2337
EhKan P, Allen T, Quigley SF (2011) FPGA implementation for GMM-based speaker identification. Int. J. Reconfig. Comput. 2011(3):1–8
Dao V-L, Nguyen V-D, Nguyen H-D, Hoang V-P (Nov. 2017) Hardware Implementation of MFCC Feature Extraction for Speech Recognition on FPGA. International Conference on Advances in Information and Communication Technology:248–254
Boujelben O, Bahoura M (2018) Efficient FPGA-based Architecture of an Automatic Wheeze Detector using a Combination of MFCC and SVM Algorithms. Journal of Systems Architecture 88:54–64
Gadamsetty NKM, Kailath BJ (2019) FPGA Implementation of Speech Recognizer for Isolated Words. 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Rourkela, India, 25-28. https://doi.org/10.1109/iSES47678.2019.00019
Anshu A, Raghuvanshi S, Muchahary D (2022) FPGA Design for Efficient Speech Processing System, 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), Ravet, India, 1-5. https://doi.org/10.1109/ASIANCON55314.2022.9908692
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tsai, TH., Wang, CL. GMM-Based Speaker Verification System with Hardware MFCC in SoC Design. Multimed Tools Appl 83, 56991–57010 (2024). https://doi.org/10.1007/s11042-023-17561-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17561-6