An important task of speaker verification is to generate speaker specific models and match an input speaker’s utterance with these models. This paper focuses on comparing the performance of text dependent speaker verification system using Mel Frequency Cepstral Coefficients feature and different Vector Quantization (VQ) based speaker modelling techniques to generate the speaker specific models. Speaker-specific information is mainly represented by spectral features and using these features we have developed the model which serves as an important entity for determining the claimed identity of the speaker. In the modelling part, we used Linde, Buzo, Gray (LBG) VQ, proposed adaptive LBG VQ and Fuzzy C Means (FCM) VQ for generating speaker specific model. The experimental results that are performed on microphonic database shows that accuracy significantly depends on the size of the codebook in all VQ techniques, and on FCM VQ accuracy also depend on the value of learning parameter of the objective function. Experiment results shows that how the accuracy of speaker verification system is depend on different representations of the codebook, different size of codebook in VQ modelling techniques and learning parameter in FCM VQ.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.
Becchetti, C., & Ricotti, L. P. (1999). Speech recognition. New York: Wiley.
Bezdek, J. C., & Harris, J. D. (1978). Fuzzy portions and relations: An axiomatic basis for clustering. Fuzzy Sets and Systems, 1, 111–127.
Burton, D. K. (1987). Text-dependent speaker verification using vector quantization source coding. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(2), 133–143.
Buzo, A., Gray, A., Gray, R., & Markel, J. (1980). Speech coding based upon vector quantization”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 562–574.
Cannon, R. L., Dave, J. A., & Bezdek, J. C. (1986). Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(2), 248.
Chan, K. P., & Cheung, Y. S. (1992). Clustering of clusters. Pattern Recognition, 25, 211–217.
Chen, S. H., & Luo, Y. R. (2009). Speaker verification using MFCC and support vector machine. Proceedings of the International MultiConference of Engineers and Computer Scientists., 1, 18.
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–365.
Deller, J. R., Proakis, J. G., & Hansen, H. L. (1993). Discrete time processing of speech signals. New York, NY: Macmillan.
Douglas, A. R. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Douglas, A. R. (2001). An overview of automatic speaker recognition technology. IEEE international Conference on Acoustic, Speech, and signal processing (ICASSP), 4, IV-4072–IV-4075.
Douglas, A. R., & Richard, C. R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72.
Fallahzadeh, M. R., Farokhi, F., Izadian, M., & Berangi, A. A. (2011). A hybrid reliable algorithm for speaker recognition based on improved DTW and VQ by genetic algorithm in noisy environment. International Conference on Multimedia and Signal Processing, 2, 269–273.
Feng, L. (2004).Speaker recognition. IMM-THESIS: ISSN 1601-233X.
Gold, B., & Morgan, N. (2000). Speech and audio signal processing. New York, NY: Wiley.
Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Auto associative neural network models for online speaker verification using source features from vowels. International Joint Conference on Neural Networks, IJCNN’02, 2, 1252–1257.
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. 4th International conference on signal processing and communication systems (ICSPCS), pp. 1–5.
Ilyas, M. Z., Samad, S. A., Hussain, A., & Ishak, K. A. (2007). Speaker verification using vector quantization and hidden markov model. 5th IEEE student conference on research and development SCOReD, Malaysia, pp. 1–5.
Jayanna, H. S., & Prasanna S. R. M. (2008). Fuzzy vector quantization for speaker recognition under limited data conditions. IEEE Region 10 Conference TENCON, pp. 1–4.
Kabir, A., & Ahsan, S. M. M. (2007). Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient. 6th WSEAS international conference on circuits, systems, electronics, control & signal processing, Cairo, Egypt.
Karpov, E. (2003). Real time speaker identification. Master’s thesis, Department of Computer Science, University of Joensuu.
Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design”. IEEE Transactions on Communications, 28(1), 84–95.
Liu, M., Huang, T. S., & Zhang, Z. (2006). Robust local scoring function for text-independent speaker verification. International Conference on Pattern Recognition (ICPR), Hong Kong, pp. 1146–1149.
Memon, S., & Lech, M. (2008). Speaker verification based on information theoretic vector quantization. Wireless Networks, Information Processing and Systems, Communications in Computer and Information Science, 20, 391–399.
Moureaux, J. M., Gauthier, P., Barlaud, M., & Bellemain, P. (1994). Vector quantization of raw SAR data. IEEE International Conference on Acoustics, Speech, and Signal Processing, 5, 189–192.
Nijhawan, G., & Soni, M. K. (2014). Speaker recognition using MFCC and vector quantization. International Journal on Recent Trends in Engineering and Technology, 11(1), 211–218.
Ou, G., & Ke, D. (2004). Text-independent speaker verification based on relation of MFCC components. International symposium on Chinese spoken language processing, pp. 57–60.
Pal, N. R., & Bezdek, J. C. (1995). On cluster validity for the fuzzy c-mean model. IEEE Transaction on Fuzzy System, 3, 370–379.
Pandit, M., & Kittler, J. (1998). Feature selection for a DTW-based speaker verification system. IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 769–772.
Prasanna, S. R. M., Zachariah, J. M., & Yegnanarayana, B. (2004). Neural network models for combining evidence from spectral and suprasegmental features for text-dependent speaker verification. International conference on intelligent sensing and information processing, pp 359–363.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice Hall.
Rabiner, L. R., Levinson, S. E., & Sondhi, M. M. (1983). On the application of vector quantization and hidden Markov models to speaker independent, isolated word recognition. The, Bell System Technical Journal, 62(4), 1075–1105.
Rabiner, L., Rosenberg, A., & Levinson, S. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 575–582.
Ramirez, J., Gorriz, J. M., & Segura, J. C. (2007). Voice activity detection. fundamentals and speech recognition system robustness. Robust Speech Recognition and Understanding. ISBN 987-3-90213-08-0.
Saquib, Z., Salam, N., Nair, R., P., Pandey, N., & Joshi, A. (2010). A survey on automatic speaker recognition systems. In Communications in computer and information science (vol. 123, pp. 134–145) Berlin: Springer.
Shena, F., & Hasegawa, O. (2006). An adaptive incremental LBG for vector quantization. Neural Network, 19(5), 694–704.
Shore, J., & Burton, D. (1983). Discrete utterance speech recognition without time alignment. IEEE Transactions on Information Theory, 29(4), 473–491.
Tappert, C. C., & Das, S. K. (1978). Memory and time improvements in dynamic time for matching speech pattern. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 583–586.
Um, I. T., Won, J. J., & Kim, M. H. (2000). Text independent speaker verification using modular neural network. IEEE-INNS-ENNS international joint conference on neural networks. Vol. 6, pp. 97–102.
Wong, L. P., & Russell, M. (2001). Text-dependent speaker verification under noisy conditions using parallel model combination. In IEEE international conference on acoustics, speech, and signal processing, Vol. 1 pp. 457–460.
Wu, Z., Gao, S., Cling, E. S., & Li, H.(2014). A study on replay attack and anti-spoofing for text-dependent speaker verification. Signal and information processing association annual summit and conference (APSIPA), pp. 1–5.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-textspeaker verification system. IEEE Transactions on Speech and Audio Processing, 3(4), 575.
Yoma, N. B., & Villar, M. (2002). Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm. IEEE Transactions on Speech and Audio Processing, 10(3), 158–166.
About this article
Cite this article
Soni, B., Debnath, S. & Das, P.K. Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization. Int J Speech Technol 19, 525–536 (2016). https://doi.org/10.1007/s10772-016-9346-4