Skip to main content
Log in

Review of various stages in speaker recognition system, performance measures and recognition toolkits

  • Published:
Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Abstract

Speaker Recognition is a vital application of speech processing. Speaker Recognition performs a task of authenticating or recognizing a speaker based on the unique features captured which characterize the speaker. Characteristics or features which are unique to an individual such as fundamental frequency, speaking style, pitch, and duration are used as distinguishing components of the human speech signal. Exploring these characteristics for various applications with an attempt to implement a robust speaker recognition system has been the impetus behind the research in this domain. This paper makes an attempt to present the available Feature Extraction and Recognition techniques with their merits and demerits. It also discusses the pre-emphasis stage of the speaker recognition system. The standard databases available for speaker recognition along with the criterion for their selection are also reviewed. The paper presents an overview of various toolkits and performance parameters of Automatic Speaker Recognition System.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits Systems Video Technology, 14(1), 4–20.

    Article  Google Scholar 

  2. Ghai, W., & Singh, N. (2012). Literature review on automatic speech recognition. International Journal of Computer Applications, 41(8), 42–50.

    Article  Google Scholar 

  3. Togneri, O. D. P. (2011). An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine, 2(11), 23–61.

    Article  Google Scholar 

  4. Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. Proceedings of ICASSP, 4, 4072–4075.

    Google Scholar 

  5. Apte, S. (2012). Speech and audio processing. New York: Wiley.

    Google Scholar 

  6. Nagroski, A., Boves, L., & Steeneken, H. (2003). In search of optimal data selection for training of automatic speech recognition systems. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE Cat. No.03EX721) (pp. 67–72).

  7. Anusuya, M. A., & Katti, S. K. (2010) Speech recognition by machine, a review. arXiv Prepr. arXiv1001.2267 (p. 25).

  8. Wu, Y., Zhang,R., & Rudnick, A. (2007). Data selection for speech recognition. In IEEE workshop on automatic speech recognition and understanding (pp. 562–565).

  9. Feng, L. (2004) Speaker recognition. Thesis, Technical University of Denmark. www.imm.dtu.dk.

  10. Campbell, J. P. & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In 1999 IEEE international conference on acoustics, speech, and signal processin. proceedings. ICASSP99 (Cat. No.99CH36258) (Vol. 2, pp. 829–832).

  11. Hokking, R., & Woraratpanya, K. (2018). Recent advances in information and communication technology 2017 (Vol. 566). Cham: Springer.

    Book  Google Scholar 

  12. Ma, J., Hu, Y., & Loizou, P. C. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. Journal of Acoustical Society of America, 125(5), 3387–3405.

    Article  Google Scholar 

  13. Honda, M. (2003). Human speech production mechanisms. NTT Technical Review, 1(2), 24–29.

    Google Scholar 

  14. Qi, Y., & Hunt, B. R. (1993). Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Transactions on Speech and Audio Processing, 1(2), 250–255.

    Article  Google Scholar 

  15. Jayanna, H., & Mahadeva Prasanna, S. (2009). Analysis, feature extraction, modeling and testing techniques for speaker recognition. IETE Technical Review, 26(3), 181.

    Article  Google Scholar 

  16. http://www.ijeemc.com (2012) (Vol. 1, No. 1).

  17. Trabelsi, I., & Ben Ayed, D. (2012) On the use of different feature extraction methods for linear and non linear kernels. In 2012 6th international conference on sciences of electronics, technologies of information and telecommunications (SETIT) (pp. 797–802).

  18. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  19. Goyani, M., Dave, N., & Patel, N. M. Performance analysis of lip synchronization using LPC, MFCC and PLP speech parameters. In 2010 international conference on computational intelligence and communication networks (pp. 582–587).

  20. Chang, S. C., & Li, T. F. (2003). Speech recognition of mandarin syllables using both linear predictive coding cepstra and Mel frequency cepstra. Pattern Recognition, 36(11), 271–272.

    Google Scholar 

  21. Nijhawan, G., & Soni, M. K. (2014). Speaker recognition using MFCC and vector quantisation. International Journal of Recent Trends in Engineering and Technology, 11(1), 211–218.

    Google Scholar 

  22. Gulzar, T., Singh, A., & Sharma, S. (2014). Comparative analysis of LPCC, MFCC and BFCC for the recognition of Hindi words using artificial neural networks. International Journal of Computer Applications, 101(12), 22–27.

    Article  Google Scholar 

  23. Jayanna, H. S., & Mahadeva Prasanna, S. R. (2009). Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Processing, 3(3), 189.

    Article  Google Scholar 

  24. Dabbaghchian, S., Sameti, H., Ghaemmaghami, M. P., & BabaAli, B. (2010). Robust phoneme recognition using MLP neural networks in various domains of MFCC features. In 2010 5th international symposium on telecommunications (pp. 755–759).

  25. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, 28(4), 357–366.

    Article  Google Scholar 

  26. Bradbury, J. (2000). Linear predictive coding. New York: McGraw-Hill.

    Google Scholar 

  27. Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.

    Article  Google Scholar 

  28. Abdulla, W. H., Chow, D., & Sin, G. (2003). Cross-words reference template for dtw-based speech recognition systems. In TENCON 2003. Conference on convergent technologies for Asia-Pacific region (Vol. 4, pp. 1576–1579).

  29. Omer, A. E. (2017). Joint MFCC-and-vector quantization based text-independent speaker recognition system. In 2017 International conference on communication, control, computing and electronics engineering (ICCCCEE) (pp. 1–6).

  30. Bin Amin, T. & Mahmood, I. (2008). Speech recognition using dynamic time warping. In 2008 2nd international conference on advances in space technologies, (pp. 74–79).

  31. Memon, S., Lech, M., & Maddage, N. (2010). Information theoretic expectation maximization based Gaussian mixture modeling for speaker verification. In 2010 20th international conference on pattern recognition (pp. 4536–4540).

  32. Ar, G. S., Raju, K. A. P., Rao, M., & Satheesh, C. P. K. (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.

    Google Scholar 

  33. Gish, H., & Schmidt, M. (1994). Text-independent speaker identification. IEEE Signal Processing Magazine, 11(4), 18–32.

    Article  Google Scholar 

  34. Dymarski, P. (2011). Hidden Markov models, theory and applications. InTech: Vienna.

    Book  Google Scholar 

  35. Zhao Lishuang, H. Z. Speech recognition system based on integrating feature and HMM. In International conference on measuring technology and mechatronics automation.

  36. Fabian, T. (2008) Confidence measurement techniques in automatic speech recognition and dialog management. Electrical and Computer Engineering (Vol. Ph.D.).

  37. Barger, P. J., & Sridharan, S. (2006). On the performance and use of speaker recognition systems for surveillance. In 2006 IEEE international conference on video and signal based surveillance, pp. 109–109.

  38. Furui, S. (2007) Audio–visual speech and speaker recognition, no. September (pp. 1–27).

  39. Sadjadi, S. O., Slaney, M., & Heck, L. (2013) MSR identity toolbox v1. 0: A MATLAB toolbox for speaker recognition research, no. November (pp. 4–7).

  40. Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rupali V. Pawar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pawar, R.V., Jalnekar, R.M. & Chitode, J.S. Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integr Circ Sig Process 94, 247–257 (2018). https://doi.org/10.1007/s10470-017-1069-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10470-017-1069-1

Keywords

Navigation