Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization
In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.
KeywordsSpeaker recognition Discrete cosine transform Fuzzy vector quantization K-Means, Linde–Buzo–Gray Mel frequency cepstral coefficients Speech feature extraction
- Abida, M. K. (2007). Fuzzy gmm-based confidence measure towards keyword spotting application. A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of master of applied science, electrical and computer engineering, University of Waterloo, Ontario, Canada. Google Scholar
- Assaleh, K. T., & Mammone, R. J. (1994). Robust cepstral features for speaker identification. In IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 129–132). Google Scholar
- Charbuillet, C., Gas, B., Chetouani, M., & Zarader, J. L. (2007). Complementary features for speaker verification based on genetic algorithms. In IEEE international conference on acoustics, speech and signal processing (Vol. 4, pp. IV-285–IV-288). Google Scholar
- Chen, J., Paliwal, K. K., Mizumachi, M., & Nakamura, S. (2001a). Robust MFCCs derived from differentiated power spectrum. In Proc. intern. conf. on speech processing, TaeJon, Korea (Vol. 2, pp. 577–582). Google Scholar
- Chen, W., Zhenjiang, M., & Xiao, M. (2001b). Comparison of different implementations of mfcc. Journal of Computer Science and Technology, 16(16), 582–589. Google Scholar
- Chen, W., Zhenjiang, M., & Xiao, M. (2008). Differential mfcc and vector quantization used for real-time speaker recognition system. In Congress on image and signal processing (pp. 319–323). Google Scholar
- Ganchev, T. D. (2005). Speaker recognition. A dissertation submitted to the University of Patras in partial fulfilment of the requirements for the degree doctor of philosophy. Google Scholar
- Hossan, M. A. (2011). Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based Mel frequency cepstral coefficients and fuzzy vector quantization. A thesis presented to the RMIT University in fulfilment of the thesis requirement for the degree of master of engineering, electrical and computer engineering, RMIT University, Melbourne, Australia. Google Scholar
- Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 4th international conference on signal processing and communication systems (ICSPCS), Dec. 2010 (pp. 1–5), 13–15. Google Scholar
- Kim, S., & Eriksson, T. (2004). A pitch synchronous feature extraction method for speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2004 (Vol. 1, pp. 405–408). Google Scholar
- MATLAB (2012). MATLAB & Simulink, Mathworks, USA. http://www.mathworks.com.au/products/matlab/. Accessed on 1 March 2012.
- National Institute of Standards and Technology speaker recognition evaluation (2004). http://www.itl.nist.gov/iad/mig/tests/spk/2004/. Accessed online 20/9/2010.
- Saeidi, R., Mohammadi, H. R. S., Rodman, R. D., & Kinnunen, T. (2007). A new segmentation algorithm combined with transient frames power for text independent speaker verification. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (Vol. 4, pp. 305–308). Google Scholar
- Shi, N., Liu, X., & Guan, Y. (2010). Research on k-means clustering algorithm: an improved k-means clustering algorithm. In Third international symposium on intelligent information technology and security informatics (IITSI), 2–4 April 2010 (pp. 63–67). Google Scholar
- Wei-Guo, G., Li-Ping, Y., & Di, C. (2008). Pitch synchronous based feature extraction for noise-robust speaker verification. In Congress on image and signal processing (CISP’08) (Vol. 5, pp. 295–298). Google Scholar
- Zilca, R. D., Navratil, J., & Ramaswamy, G. N. (2003). Depitch and the role of fundamental frequency in speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2003 (Vol. 2, pp. 81–84). Google Scholar