Review of various stages in speaker recognition system, performance measures and recognition toolkits

Pawar, Rupali V.; Jalnekar, Rajesh M.; Chitode, Janardan S.

doi:10.1007/s10470-017-1069-1

Review of various stages in speaker recognition system, performance measures and recognition toolkits

Published: 05 December 2017

Volume 94, pages 247–257, (2018)
Cite this article

Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Rupali V. Pawar¹,
Rajesh M. Jalnekar² &
Janardan S. Chitode²

800 Accesses
20 Citations
Explore all metrics

Abstract

Speaker Recognition is a vital application of speech processing. Speaker Recognition performs a task of authenticating or recognizing a speaker based on the unique features captured which characterize the speaker. Characteristics or features which are unique to an individual such as fundamental frequency, speaking style, pitch, and duration are used as distinguishing components of the human speech signal. Exploring these characteristics for various applications with an attempt to implement a robust speaker recognition system has been the impetus behind the research in this domain. This paper makes an attempt to present the available Feature Extraction and Recognition techniques with their merits and demerits. It also discusses the pre-emphasis stage of the speaker recognition system. The standard databases available for speaker recognition along with the criterion for their selection are also reviewed. The paper presents an overview of various toolkits and performance parameters of Automatic Speaker Recognition System.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A review on face recognition systems: recent approaches and challenges

Article 30 July 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

References

Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits Systems Video Technology, 14(1), 4–20.
Article Google Scholar
Ghai, W., & Singh, N. (2012). Literature review on automatic speech recognition. International Journal of Computer Applications, 41(8), 42–50.
Article Google Scholar
Togneri, O. D. P. (2011). An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine, 2(11), 23–61.
Article Google Scholar
Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. Proceedings of ICASSP, 4, 4072–4075.
Google Scholar
Apte, S. (2012). Speech and audio processing. New York: Wiley.
Google Scholar
Nagroski, A., Boves, L., & Steeneken, H. (2003). In search of optimal data selection for training of automatic speech recognition systems. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE Cat. No.03EX721) (pp. 67–72).
Anusuya, M. A., & Katti, S. K. (2010) Speech recognition by machine, a review. arXiv Prepr. arXiv1001.2267 (p. 25).
Wu, Y., Zhang,R., & Rudnick, A. (2007). Data selection for speech recognition. In IEEE workshop on automatic speech recognition and understanding (pp. 562–565).
Feng, L. (2004) Speaker recognition. Thesis, Technical University of Denmark. www.imm.dtu.dk.
Campbell, J. P. & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In 1999 IEEE international conference on acoustics, speech, and signal processin. proceedings. ICASSP99 (Cat. No.99CH36258) (Vol. 2, pp. 829–832).
Hokking, R., & Woraratpanya, K. (2018). Recent advances in information and communication technology 2017 (Vol. 566). Cham: Springer.
Book Google Scholar
Ma, J., Hu, Y., & Loizou, P. C. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. Journal of Acoustical Society of America, 125(5), 3387–3405.
Article Google Scholar
Honda, M. (2003). Human speech production mechanisms. NTT Technical Review, 1(2), 24–29.
Google Scholar
Qi, Y., & Hunt, B. R. (1993). Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Transactions on Speech and Audio Processing, 1(2), 250–255.
Article Google Scholar
Jayanna, H., & Mahadeva Prasanna, S. (2009). Analysis, feature extraction, modeling and testing techniques for speaker recognition. IETE Technical Review, 26(3), 181.
Article Google Scholar
http://www.ijeemc.com (2012) (Vol. 1, No. 1).
Trabelsi, I., & Ben Ayed, D. (2012) On the use of different feature extraction methods for linear and non linear kernels. In 2012 6th international conference on sciences of electronics, technologies of information and telecommunications (SETIT) (pp. 797–802).
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Goyani, M., Dave, N., & Patel, N. M. Performance analysis of lip synchronization using LPC, MFCC and PLP speech parameters. In 2010 international conference on computational intelligence and communication networks (pp. 582–587).
Chang, S. C., & Li, T. F. (2003). Speech recognition of mandarin syllables using both linear predictive coding cepstra and Mel frequency cepstra. Pattern Recognition, 36(11), 271–272.
Google Scholar
Nijhawan, G., & Soni, M. K. (2014). Speaker recognition using MFCC and vector quantisation. International Journal of Recent Trends in Engineering and Technology, 11(1), 211–218.
Google Scholar
Gulzar, T., Singh, A., & Sharma, S. (2014). Comparative analysis of LPCC, MFCC and BFCC for the recognition of Hindi words using artificial neural networks. International Journal of Computer Applications, 101(12), 22–27.
Article Google Scholar
Jayanna, H. S., & Mahadeva Prasanna, S. R. (2009). Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Processing, 3(3), 189.
Article Google Scholar
Dabbaghchian, S., Sameti, H., Ghaemmaghami, M. P., & BabaAli, B. (2010). Robust phoneme recognition using MLP neural networks in various domains of MFCC features. In 2010 5th international symposium on telecommunications (pp. 755–759).
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, 28(4), 357–366.
Article Google Scholar
Bradbury, J. (2000). Linear predictive coding. New York: McGraw-Hill.
Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Article Google Scholar
Abdulla, W. H., Chow, D., & Sin, G. (2003). Cross-words reference template for dtw-based speech recognition systems. In TENCON 2003. Conference on convergent technologies for Asia-Pacific region (Vol. 4, pp. 1576–1579).
Omer, A. E. (2017). Joint MFCC-and-vector quantization based text-independent speaker recognition system. In 2017 International conference on communication, control, computing and electronics engineering (ICCCCEE) (pp. 1–6).
Bin Amin, T. & Mahmood, I. (2008). Speech recognition using dynamic time warping. In 2008 2nd international conference on advances in space technologies, (pp. 74–79).
Memon, S., Lech, M., & Maddage, N. (2010). Information theoretic expectation maximization based Gaussian mixture modeling for speaker verification. In 2010 20th international conference on pattern recognition (pp. 4536–4540).
Ar, G. S., Raju, K. A. P., Rao, M., & Satheesh, C. P. K. (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.
Google Scholar
Gish, H., & Schmidt, M. (1994). Text-independent speaker identification. IEEE Signal Processing Magazine, 11(4), 18–32.
Article Google Scholar
Dymarski, P. (2011). Hidden Markov models, theory and applications. InTech: Vienna.
Book Google Scholar
Zhao Lishuang, H. Z. Speech recognition system based on integrating feature and HMM. In International conference on measuring technology and mechatronics automation.
Fabian, T. (2008) Confidence measurement techniques in automatic speech recognition and dialog management. Electrical and Computer Engineering (Vol. Ph.D.).
Barger, P. J., & Sridharan, S. (2006). On the performance and use of speaker recognition systems for surveillance. In 2006 IEEE international conference on video and signal based surveillance, pp. 109–109.
Furui, S. (2007) Audio–visual speech and speaker recognition, no. September (pp. 1–27).
Sadjadi, S. O., Slaney, M., & Heck, L. (2013) MSR identity toolbox v1. 0: A MATLAB toolbox for speaker recognition research, no. November (pp. 4–7).
Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Sinhgad College of Engineering, Pune, Maharashtra, India
Rupali V. Pawar
Vishwakarma Institute of Technology, Pune, Maharashtra, India
Rajesh M. Jalnekar & Janardan S. Chitode

Authors

Rupali V. Pawar
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh M. Jalnekar
View author publications
You can also search for this author in PubMed Google Scholar
Janardan S. Chitode
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rupali V. Pawar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pawar, R.V., Jalnekar, R.M. & Chitode, J.S. Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integr Circ Sig Process 94, 247–257 (2018). https://doi.org/10.1007/s10470-017-1069-1

Download citation

Received: 25 August 2016
Accepted: 27 October 2017
Published: 05 December 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10470-017-1069-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of various stages in speaker recognition system, performance measures and recognition toolkits

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A review on face recognition systems: recent approaches and challenges

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Review of various stages in speaker recognition system, performance measures and recognition toolkits

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A review on face recognition systems: recent approaches and challenges

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation