Skip to main content

Source and System Features for Text Independent Speaker Recognition Using GMM Speaker Models

  • Conference paper
Recent Trends in Networks and Communications (WeST 2010, VLSI 2010, NeCoM 2010, ASUC 2010, WiMoN 2010)

Abstract

The main objective of this paper is to explore the effectiveness of perceptual features combined with pitch for text independent speaker recognition. In this algorithm, these features are captured and Gaussian mixture models are developed representing L feature vectors of speech for every speaker. Speakers are identified based on first finding posteriori probability density function between mixtures of speaker models and test speech vectors. Speakers are classified based on maximum probability density function which corresponds to a speaker model. This algorithm gives the good overall accuracy of 98% for mel frequency perceptual linear predictive cepstrum combined with pitch for identifying speaker among 8 speakers chosen randomly from 8 different dialect regions in “TIMIT” database by considering GMM speaker models of 12 mixtures. It also gives the better average accuracy of 95.75% for the same feature with respect to 8 speakers chosen randomly from the same dialect region for12 mixtures GMM speaker models. Mel frequency linear predictive cepstrum gives the better accuracy of 96.75% and 96.125% for GMM speaker models of 16 mixtures by considering speakers from different dialect regions and from same dialect region respectively. This algorithm is also evaluated for 4, 8 and 32 mixtures GMM speaker models. 12 mixtures GMM speaker models are tested for population of 20 speakers and the accuracy is found to be slightly less as compared to that for the the speaker population of 8 speakers. The noteworthy feature of speaker identification algorithm is to evaluate the testing procedure on identical messages for all the speakers. This work is extended to speaker verification whose performance is measured in terms of % False rejection rate, % False acceptance rate and % Equal error rate. % False acceptance rate and % Equal error rate are found to be less for mel frequency perceptual linear predictive cepstrum with pitch and % false rejection rate is less for mel frequency linear predictive cepstrum. In this work, F-ratio is computed as a theoretical measure on the features of the training speeches to validate the experimental results for perceptual features with pitch. χ 2 distribution tool is used to perform the statistical justification of good experimental results for all the features with respect to both speaker identification and verification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Revathi, A., Chinnadurai, R., Venkataramani, Y.: T-LPCC and T-LSF in twins identification based on speaker clustering. In: Proceedings of IEEE INDICON, IEEE Bangalore section, September 2007, pp. 25–26 (2007)

    Google Scholar 

  2. Hermansky, H., Tsuga, K., Makino, S., Wakita, H.: Perceptually based processing in automatic speech recognition. In: Proceedings of IEEE international conference on Acoustics, speech and signal processing, Tokyo, April 1986, vol. 11, pp. 1971–1974 (1986)

    Google Scholar 

  3. Hermansky, H., Margon, N., Bayya, A., Kohn, P.: The challenge of Inverse E: The RASTA PLP method. In: Proceedings of twenty fifth IEEE Asilomar conference on signals, systems and computers, Pacific Grove, CA, USA, November 1991, vol. 2, pp. 800–804 (1991)

    Google Scholar 

  4. Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE transactions on speech and audio processing 2(4), 578–589 (1994)

    Article  Google Scholar 

  5. Revathi, A., Venkataramani, Y.: Text independent speaker identification/verification using multiple features. In: International conference on computer science and information engineering, Los Angeles, USA (April 2009)

    Google Scholar 

  6. Revathi, A., Venkataramani, Y.: Iterative clustering approach for text independent speaker identification using multiple features. In: Proceedings of International conference on signal processing and communication systems, Gold coast, Australia (December 2008)

    Google Scholar 

  7. Revathi, A., Venkataramani, Y.: Use of perceptual features in iterative clustering based twins identification system. In: Proceedings of International conference on computing, communication and networking, India (December 2008)

    Google Scholar 

  8. Revathi, A., Chinnadurai, R., Venkataramani, Y.: Effectiveness of LP derived features and DCTC in twins identification-Iterative speaker clustering approach. In: Proceedings of IEEE ICCIMA, December 2007, vol. 1, pp. 535–539 (2007)

    Google Scholar 

  9. Rabiner, L., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall, NJ (1993)

    Google Scholar 

  10. Zheng, R., Zhang, S., Xu, B.: Improvement of speaker identification by combining prosodic features with acoustic features. In: Li, S.Z., Lai, J.-H., Tan, T., Feng, G.-C., Wang, Y. (eds.) SINOBIOMETRICS 2004. LNCS, vol. 3338, pp. 569–576. Springer, Heidelberg (2004)

    Google Scholar 

  11. Yegnanarayana, B., Sharat Reddy, K., Kishore, K.P.: Source and system features for speaker recognition using AANN models. In: Proceedings of IEEE International Conference on ASSP, pp. 409–412 (2001)

    Google Scholar 

  12. Chakraborthy, S., Saha, G.: Improved text independent speaker identification using fused MFCC and IMFCC feature sets based on Gaussian filters. International journal of signal processing (2009)

    Google Scholar 

  13. Hossienzadeh, Krishnan: Combining vocal source and MFCC features for enhanced speaker recognition performance using GMMs. In: Proceedings of IEEE 9th workshop on Multimedia Signal processing, October 2007, pp. 365–368 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Revathi, A., Venkataramani, Y. (2010). Source and System Features for Text Independent Speaker Recognition Using GMM Speaker Models. In: Meghanathan, N., Boumerdassi, S., Chaki, N., Nagamalai, D. (eds) Recent Trends in Networks and Communications. WeST VLSI NeCoM ASUC WiMoN 2010 2010 2010 2010 2010. Communications in Computer and Information Science, vol 90. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14493-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14493-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14492-9

  • Online ISBN: 978-3-642-14493-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics