Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Abstract

An important task of speaker verification is to generate speaker specific models and match an input speaker’s utterance with these models. This paper focuses on comparing the performance of text dependent speaker verification system using Mel Frequency Cepstral Coefficients feature and different Vector Quantization (VQ) based speaker modelling techniques to generate the speaker specific models. Speaker-specific information is mainly represented by spectral features and using these features we have developed the model which serves as an important entity for determining the claimed identity of the speaker. In the modelling part, we used Linde, Buzo, Gray (LBG) VQ, proposed adaptive LBG VQ and Fuzzy C Means (FCM) VQ for generating speaker specific model. The experimental results that are performed on microphonic database shows that accuracy significantly depends on the size of the codebook in all VQ techniques, and on FCM VQ accuracy also depend on the value of learning parameter of the objective function. Experiment results shows that how the accuracy of speaker verification system is depend on different representations of the codebook, different size of codebook in VQ modelling techniques and learning parameter in FCM VQ.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.

    Article  Google Scholar 

  2. Becchetti, C., & Ricotti, L. P. (1999). Speech recognition. New York: Wiley.

    Google Scholar 

  3. Bezdek, J. C., & Harris, J. D. (1978). Fuzzy portions and relations: An axiomatic basis for clustering. Fuzzy Sets and Systems, 1, 111–127.

    MathSciNet  Article  MATH  Google Scholar 

  4. Burton, D. K. (1987). Text-dependent speaker verification using vector quantization source coding. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(2), 133–143.

    Article  Google Scholar 

  5. Buzo, A., Gray, A., Gray, R., & Markel, J. (1980). Speech coding based upon vector quantization”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(5), 562–574.

    MathSciNet  Article  MATH  Google Scholar 

  6. Cannon, R. L., Dave, J. A., & Bezdek, J. C. (1986). Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(2), 248.

    Article  MATH  Google Scholar 

  7. Chan, K. P., & Cheung, Y. S. (1992). Clustering of clusters. Pattern Recognition, 25, 211–217.

    Article  Google Scholar 

  8. Chen, S. H., & Luo, Y. R. (2009). Speaker verification using MFCC and support vector machine. Proceedings of the International MultiConference of Engineers and Computer Scientists., 1, 18.

    Google Scholar 

  9. Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–365.

    Article  Google Scholar 

  10. Deller, J. R., Proakis, J. G., & Hansen, H. L. (1993). Discrete time processing of speech signals. New York, NY: Macmillan.

    Google Scholar 

  11. Douglas, A. R. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.

    Article  Google Scholar 

  12. Douglas, A. R. (2001). An overview of automatic speaker recognition technology. IEEE international Conference on Acoustic, Speech, and signal processing (ICASSP), 4, IV-4072–IV-4075.

  13. Douglas, A. R., & Richard, C. R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72.

    Article  Google Scholar 

  14. Fallahzadeh, M. R., Farokhi, F., Izadian, M., & Berangi, A. A. (2011). A hybrid reliable algorithm for speaker recognition based on improved DTW and VQ by genetic algorithm in noisy environment. International Conference on Multimedia and Signal Processing, 2, 269–273.

    Google Scholar 

  15. Feng, L. (2004).Speaker recognition. IMM-THESIS: ISSN 1601-233X.

  16. Gold, B., & Morgan, N. (2000). Speech and audio signal processing. New York, NY: Wiley.

    Google Scholar 

  17. Gupta, C. S., Prasanna, S. R. M., & Yegnanarayana, B. (2002). Auto associative neural network models for online speaker verification using source features from vowels. International Joint Conference on Neural Networks, IJCNN’02, 2, 1252–1257.

    Google Scholar 

  18. Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. 4th International conference on signal processing and communication systems (ICSPCS), pp. 1–5.

  19. Ilyas, M. Z., Samad, S. A., Hussain, A., & Ishak, K. A. (2007). Speaker verification using vector quantization and hidden markov model. 5th IEEE student conference on research and development SCOReD, Malaysia, pp. 1–5.

  20. Jayanna, H. S., & Prasanna S. R. M. (2008). Fuzzy vector quantization for speaker recognition under limited data conditions. IEEE Region 10 Conference TENCON, pp. 1–4.

  21. Kabir, A., & Ahsan, S. M. M. (2007). Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient. 6th WSEAS international conference on circuits, systems, electronics, control & signal processing, Cairo, Egypt.

  22. Karpov, E. (2003). Real time speaker identification. Master’s thesis, Department of Computer Science, University of Joensuu.

  23. Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design”. IEEE Transactions on Communications, 28(1), 84–95.

    Article  Google Scholar 

  24. Liu, M., Huang, T. S., & Zhang, Z. (2006). Robust local scoring function for text-independent speaker verification. International Conference on Pattern Recognition (ICPR), Hong Kong, pp. 1146–1149.

  25. Memon, S., & Lech, M. (2008). Speaker verification based on information theoretic vector quantization. Wireless Networks, Information Processing and Systems, Communications in Computer and Information Science, 20, 391–399.

    Article  Google Scholar 

  26. Moureaux, J. M., Gauthier, P., Barlaud, M., & Bellemain, P. (1994). Vector quantization of raw SAR data. IEEE International Conference on Acoustics, Speech, and Signal Processing, 5, 189–192.

    Google Scholar 

  27. Nijhawan, G., & Soni, M. K. (2014). Speaker recognition using MFCC and vector quantization. International Journal on Recent Trends in Engineering and Technology, 11(1), 211–218.

    Google Scholar 

  28. Ou, G., & Ke, D. (2004). Text-independent speaker verification based on relation of MFCC components. International symposium on Chinese spoken language processing, pp. 57–60.

  29. Pal, N. R., & Bezdek, J. C. (1995). On cluster validity for the fuzzy c-mean model. IEEE Transaction on Fuzzy System, 3, 370–379.

    Article  Google Scholar 

  30. Pandit, M., & Kittler, J. (1998). Feature selection for a DTW-based speaker verification system. IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 769–772.

    Google Scholar 

  31. Prasanna, S. R. M., Zachariah, J. M., & Yegnanarayana, B. (2004). Neural network models for combining evidence from spectral and suprasegmental features for text-dependent speaker verification. International conference on intelligent sensing and information processing, pp 359–363.

  32. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice Hall.

    Google Scholar 

  33. Rabiner, L. R., Levinson, S. E., & Sondhi, M. M. (1983). On the application of vector quantization and hidden Markov models to speaker independent, isolated word recognition. The, Bell System Technical Journal, 62(4), 1075–1105.

    MathSciNet  Article  Google Scholar 

  34. Rabiner, L., Rosenberg, A., & Levinson, S. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 575–582.

    Article  MATH  Google Scholar 

  35. Ramirez, J., Gorriz, J. M., & Segura, J. C. (2007). Voice activity detection. fundamentals and speech recognition system robustness. Robust Speech Recognition and Understanding. ISBN 987-3-90213-08-0.

  36. Saquib, Z., Salam, N., Nair, R., P., Pandey, N., & Joshi, A. (2010). A survey on automatic speaker recognition systems. In Communications in computer and information science (vol. 123, pp. 134–145) Berlin: Springer.

  37. Shena, F., & Hasegawa, O. (2006). An adaptive incremental LBG for vector quantization. Neural Network, 19(5), 694–704.

    Article  MATH  Google Scholar 

  38. Shore, J., & Burton, D. (1983). Discrete utterance speech recognition without time alignment. IEEE Transactions on Information Theory, 29(4), 473–491.

    Article  Google Scholar 

  39. Tappert, C. C., & Das, S. K. (1978). Memory and time improvements in dynamic time for matching speech pattern. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 583–586.

    Article  Google Scholar 

  40. Um, I. T., Won, J. J., & Kim, M. H. (2000). Text independent speaker verification using modular neural network. IEEE-INNS-ENNS international joint conference on neural networks. Vol. 6, pp. 97–102.

  41. Wong, L. P., & Russell, M. (2001). Text-dependent speaker verification under noisy conditions using parallel model combination. In IEEE international conference on acoustics, speech, and signal processing, Vol. 1 pp. 457–460.

  42. Wu, Z., Gao, S., Cling, E. S., & Li, H.(2014). A study on replay attack and anti-spoofing for text-dependent speaker verification. Signal and information processing association annual summit and conference (APSIPA), pp. 1–5.

  43. Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-textspeaker verification system. IEEE Transactions on Speech and Audio Processing, 3(4), 575.

    Article  Google Scholar 

  44. Yoma, N. B., & Villar, M. (2002). Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm. IEEE Transactions on Speech and Audio Processing, 10(3), 158–166.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Badal Soni.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Soni, B., Debnath, S. & Das, P.K. Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization. Int J Speech Technol 19, 525–536 (2016). https://doi.org/10.1007/s10772-016-9346-4

Download citation

Keywords

  • Text dependent speaker verification
  • VAD
  • MFCC
  • VQ
  • FCM