GMM-Based Speaker Verification System with Hardware MFCC in SoC Design

Tsai, Tsung-Han; Wang, Chiao-Li

doi:10.1007/s11042-023-17561-6

GMM-Based Speaker Verification System with Hardware MFCC in SoC Design

Published: 14 December 2023

(2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

40 Accesses
Explore all metrics

Abstract

In recent years, speaker verification has been extensively explored and has significantly improved its effectiveness. It analyzes the voiceprint characteristics of speakers and finds out the differences in voiceprint characteristics between speakers for verification. In this paper, we propose a text-dependent speaker verification system and its hardware implementation of the feature extraction. The proposed speaker verification system includes two phases: enrollment and verification. In the enrollment phase, the speaker has to provide appropriate speech, such as continuous number strings, sentences, or phrases for building the speakers’ models in the system. In the verification phase, the verified speech is substituted into the enrolled speaker models, and the similarity between the speech and the models is used to discriminate. We further design the whole system in a system-on-a-chip (SoC). We focus on the Mel-frequency cepstral coefficients (MFCCs) pre-processing module on FPGA and implement the lightweight the post-processing models such as Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) in software. A piece of speech data can be processed in 53.6ms to meet the real-time way. The proposed speaker verification system has a 93.3% accuracy rate. The overall architecture consumes only 4.26W on Xilinx ZCU104. Moreover, the proposed MFCC chip was implemented in TSMC 90nm, and the gate count is 276k at 1 volt while power consumption is 41.15 mW with a 200 MHz operating frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition

Employing FPGA Accelerator in Real-Time Speaker Identification Systems

Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

Article 13 November 2020

Data availability

The datasets generated and/or analyzed during the present study are available from the corresponding author on reasonable request.

References

Mahboob T, Khanam M, Khiyal M, Bibi R (2015) Speaker Identification Using GMM with MFCC. Intl J Comput Sci 12: 126-135
Zeinali H, Sameti H, Burget L (2017) HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25:1421–1435
Article Google Scholar
Mao S, Tao D, Zhang G, Ching PC, Lee T (2019) Revisiting Hidden Markov Models for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6715-6719
Chakroun R, Zouari LB, Frikha M, Hamida AB (2016) Improving text-independent speaker recognition with GMM, 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 693-696
Ru Wang; Wentao Fan (2019) Positive Sequential Data Modeling Using Continuous Hidden Markov Models Based on Inverted Dirichlet Mixtures. IEEE Access
Babu SP, Jayadas CK (2015) GMM Based Speaker Verification System, International Journal of Engineering Research and Technology (IJERT), vol: 04, pp. 1398-1401
Povey D et al. (2011) The Kaldi Speech Recognition Toolkit, IEEE Workshop on Automatic Speech Recognition and Understanding
Ramos-Lara R, López-García M, Cantó-Navarro E, Puente-Rodriguez L (May 2013) Real-time speaker verification system implemented on reconfigurable hardware. J Signal Process Syst 71(2):89–103
Article Google Scholar
Dalmiya CP, Dharun VS, Rajesh KP (2013) An efficient method for Tamil speech recognition using MFCC and DTW for mobile applications, IEEE Conference on Information & Communication Technologies
von Zeddelmann D, Kurth F, Müller M (2010) Perceptual audio features for unsupervised key-phrase detection," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, 257-260
Zhang Y, Glass JR (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams, in ASRU, 398–403
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, pp. 72–83
CRV 2010 A short tutorial on Gaussian Mixture Models. by: Mohand Saïd Allili Université du Québec en Outaouais
Toruk MM, Gokay R (2019) Short Utterance Speaker Recognition Using Time-Delay Neural Network, 16th International Multi-Conference on Systems, Signals & Devices (SSD)
Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018) X-Vectors: Robust DNN Embeddings for Speaker Recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 5329-5333, https://doi.org/10.1109/ICASSP.2018.8461375
Chen C-P, Zhang S-Y, Yeh C-T, Wang J-C, Wang T, Huang C-L (2019) Speaker Characterization Using TDNN-LSTM Based Speaker Embedding, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 6211-6215. https://doi.org/10.1109/ICASSP.2019.8683185
Chung JS, Nagrani A, Zisserman A (2018) VoxCeleb2: Deep Speaker Recognition. Proc. Interspeech 2018, 1086-1090, doi: 10.21437/Interspeech.2018-1929
Sarkar A, Tan Zheng-Hua (2016). Text Dependent Speaker Verification Using un-supervised HMM-UBM and Temporal GMM-UBM. 10.21437/Interspeech.2016-362
Furui S (1997) Recent advances in speaker recognition. Pattern Recognition Letters 18:859–872
Article Google Scholar
Thu DDT, Van LT, Hong QN, Ngoc HP (2013) Text-dependent speaker recognition for Vietnamese, 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, 196-200
Babu SP, Jayadas CK (2015) GMM Based Speaker Verification System, International Journal of Engineering Research & Technology (IJERT) Volume 04, Issue 04
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology (ISSN 2320-6802). Volume 1
Chauhan N, Isshiki T, Li D (2019) Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM Classifier for Large Input Database, IEEE 4th International Conference on Computer and Communication Systems (ICCCS)
Chowdhury A, Ross A (2020) Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals, IEEE Trans Inform Forens Secur
Chougala M, Kuntoji S (2016) Novel text independent speaker recognition using LPC based formants, 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, 510-513
Sharma R, Bhukya R, Prasanna SRM (2017) Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification. Speech Communication. 96. https://doi.org/10.1016/j.specom.2017.12.001
Islam Md, Galib ANS (2019) Bangla Dataset and MMFCC in Text-dependent Speaker Identification. 10.14456/easr.2019.7
Das R, Prasanna S (2017) Speaker Verification from Short Utterance Perspective: A Review. IETE Technical Review. 1-19
Lee KA, Ma B, Li H (2014) Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication Volume 60:56–77
Article Google Scholar
Muhammad A, Prihatmanto AS et al. (2018) Distance Measurements Method for The Demite Pronunciation Assessment, IEEE 8th International Conference on System Engineering and Technology (ICSET)
Moon TK (1996) The expectation-maximization algorithm. in IEEE Signal Processing Magazine 13(6):47–60
Article Google Scholar
Khadkevich M, Omologo M (2013) Reassigned spectrum-based feature extraction for GMM-based automatic chord recognition. J EURASIP Journal on Audio, Speech, and Music Processing
Manning CD, Schűtze H (2001) Foundations of Statistical Natural Language Processing, Fourth printing. The MIT Press
Google Scholar
Rabiner LR (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of IEEE 77(2):257–285
Article Google Scholar
Lévy C, Linarès G, Bonastre J (2009) Compact Acoustic Models for Embedded Speech Recognition. J EURASIP Journal on Audio, Speech, and Music Processing 2009:806186
Jo J, Yoo H, Park I-C (2016) Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24(2):754–758
Article Google Scholar
Vu N-V, Whittington J, Ye H, Devlin J (2010) Implementation of the MFCC front-end for low-cost speech recognition systems, in Proc ISCAS, pp. 2334–2337
EhKan P, Allen T, Quigley SF (2011) FPGA implementation for GMM-based speaker identification. Int. J. Reconfig. Comput. 2011(3):1–8
Article Google Scholar
Dao V-L, Nguyen V-D, Nguyen H-D, Hoang V-P (Nov. 2017) Hardware Implementation of MFCC Feature Extraction for Speech Recognition on FPGA. International Conference on Advances in Information and Communication Technology:248–254
Boujelben O, Bahoura M (2018) Efficient FPGA-based Architecture of an Automatic Wheeze Detector using a Combination of MFCC and SVM Algorithms. Journal of Systems Architecture 88:54–64
Article Google Scholar
Gadamsetty NKM, Kailath BJ (2019) FPGA Implementation of Speech Recognizer for Isolated Words. 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Rourkela, India, 25-28. https://doi.org/10.1109/iSES47678.2019.00019
Anshu A, Raghuvanshi S, Muchahary D (2022) FPGA Design for Efficient Speech Processing System, 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), Ravet, India, 1-5. https://doi.org/10.1109/ASIANCON55314.2022.9908692

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Central University, No.300, Jung-Da Rd., Taoyuan County 320, Jung -Li City, Taiwan, Republic of China
Tsung-Han Tsai & Chiao-Li Wang

Authors

Tsung-Han Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Chiao-Li Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tsung-Han Tsai.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tsai, TH., Wang, CL. GMM-Based Speaker Verification System with Hardware MFCC in SoC Design. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17561-6

Download citation

Received: 05 June 2022
Revised: 16 May 2023
Accepted: 17 October 2023
Published: 14 December 2023
DOI: https://doi.org/10.1007/s11042-023-17561-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GMM-Based Speaker Verification System with Hardware MFCC in SoC Design

Abstract

Access this article

Similar content being viewed by others

Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition

Employing FPGA Accelerator in Real-Time Speaker Identification Systems

Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GMM-Based Speaker Verification System with Hardware MFCC in SoC Design

Abstract

Access this article

Similar content being viewed by others

Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition

Employing FPGA Accelerator in Real-Time Speaker Identification Systems

Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation