Abstract
Speaker Recognition is a vital application of speech processing. Speaker Recognition performs a task of authenticating or recognizing a speaker based on the unique features captured which characterize the speaker. Characteristics or features which are unique to an individual such as fundamental frequency, speaking style, pitch, and duration are used as distinguishing components of the human speech signal. Exploring these characteristics for various applications with an attempt to implement a robust speaker recognition system has been the impetus behind the research in this domain. This paper makes an attempt to present the available Feature Extraction and Recognition techniques with their merits and demerits. It also discusses the pre-emphasis stage of the speaker recognition system. The standard databases available for speaker recognition along with the criterion for their selection are also reviewed. The paper presents an overview of various toolkits and performance parameters of Automatic Speaker Recognition System.
Similar content being viewed by others
References
Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits Systems Video Technology, 14(1), 4–20.
Ghai, W., & Singh, N. (2012). Literature review on automatic speech recognition. International Journal of Computer Applications, 41(8), 42–50.
Togneri, O. D. P. (2011). An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine, 2(11), 23–61.
Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. Proceedings of ICASSP, 4, 4072–4075.
Apte, S. (2012). Speech and audio processing. New York: Wiley.
Nagroski, A., Boves, L., & Steeneken, H. (2003). In search of optimal data selection for training of automatic speech recognition systems. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE Cat. No.03EX721) (pp. 67–72).
Anusuya, M. A., & Katti, S. K. (2010) Speech recognition by machine, a review. arXiv Prepr. arXiv1001.2267 (p. 25).
Wu, Y., Zhang,R., & Rudnick, A. (2007). Data selection for speech recognition. In IEEE workshop on automatic speech recognition and understanding (pp. 562–565).
Feng, L. (2004) Speaker recognition. Thesis, Technical University of Denmark. www.imm.dtu.dk.
Campbell, J. P. & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In 1999 IEEE international conference on acoustics, speech, and signal processin. proceedings. ICASSP99 (Cat. No.99CH36258) (Vol. 2, pp. 829–832).
Hokking, R., & Woraratpanya, K. (2018). Recent advances in information and communication technology 2017 (Vol. 566). Cham: Springer.
Ma, J., Hu, Y., & Loizou, P. C. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. Journal of Acoustical Society of America, 125(5), 3387–3405.
Honda, M. (2003). Human speech production mechanisms. NTT Technical Review, 1(2), 24–29.
Qi, Y., & Hunt, B. R. (1993). Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Transactions on Speech and Audio Processing, 1(2), 250–255.
Jayanna, H., & Mahadeva Prasanna, S. (2009). Analysis, feature extraction, modeling and testing techniques for speaker recognition. IETE Technical Review, 26(3), 181.
http://www.ijeemc.com (2012) (Vol. 1, No. 1).
Trabelsi, I., & Ben Ayed, D. (2012) On the use of different feature extraction methods for linear and non linear kernels. In 2012 6th international conference on sciences of electronics, technologies of information and telecommunications (SETIT) (pp. 797–802).
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Goyani, M., Dave, N., & Patel, N. M. Performance analysis of lip synchronization using LPC, MFCC and PLP speech parameters. In 2010 international conference on computational intelligence and communication networks (pp. 582–587).
Chang, S. C., & Li, T. F. (2003). Speech recognition of mandarin syllables using both linear predictive coding cepstra and Mel frequency cepstra. Pattern Recognition, 36(11), 271–272.
Nijhawan, G., & Soni, M. K. (2014). Speaker recognition using MFCC and vector quantisation. International Journal of Recent Trends in Engineering and Technology, 11(1), 211–218.
Gulzar, T., Singh, A., & Sharma, S. (2014). Comparative analysis of LPCC, MFCC and BFCC for the recognition of Hindi words using artificial neural networks. International Journal of Computer Applications, 101(12), 22–27.
Jayanna, H. S., & Mahadeva Prasanna, S. R. (2009). Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Processing, 3(3), 189.
Dabbaghchian, S., Sameti, H., Ghaemmaghami, M. P., & BabaAli, B. (2010). Robust phoneme recognition using MLP neural networks in various domains of MFCC features. In 2010 5th international symposium on telecommunications (pp. 755–759).
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, 28(4), 357–366.
Bradbury, J. (2000). Linear predictive coding. New York: McGraw-Hill.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Abdulla, W. H., Chow, D., & Sin, G. (2003). Cross-words reference template for dtw-based speech recognition systems. In TENCON 2003. Conference on convergent technologies for Asia-Pacific region (Vol. 4, pp. 1576–1579).
Omer, A. E. (2017). Joint MFCC-and-vector quantization based text-independent speaker recognition system. In 2017 International conference on communication, control, computing and electronics engineering (ICCCCEE) (pp. 1–6).
Bin Amin, T. & Mahmood, I. (2008). Speech recognition using dynamic time warping. In 2008 2nd international conference on advances in space technologies, (pp. 74–79).
Memon, S., Lech, M., & Maddage, N. (2010). Information theoretic expectation maximization based Gaussian mixture modeling for speaker verification. In 2010 20th international conference on pattern recognition (pp. 4536–4540).
Ar, G. S., Raju, K. A. P., Rao, M., & Satheesh, C. P. K. (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.
Gish, H., & Schmidt, M. (1994). Text-independent speaker identification. IEEE Signal Processing Magazine, 11(4), 18–32.
Dymarski, P. (2011). Hidden Markov models, theory and applications. InTech: Vienna.
Zhao Lishuang, H. Z. Speech recognition system based on integrating feature and HMM. In International conference on measuring technology and mechatronics automation.
Fabian, T. (2008) Confidence measurement techniques in automatic speech recognition and dialog management. Electrical and Computer Engineering (Vol. Ph.D.).
Barger, P. J., & Sridharan, S. (2006). On the performance and use of speaker recognition systems for surveillance. In 2006 IEEE international conference on video and signal based surveillance, pp. 109–109.
Furui, S. (2007) Audio–visual speech and speaker recognition, no. September (pp. 1–27).
Sadjadi, S. O., Slaney, M., & Heck, L. (2013) MSR identity toolbox v1. 0: A MATLAB toolbox for speaker recognition research, no. November (pp. 4–7).
Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pawar, R.V., Jalnekar, R.M. & Chitode, J.S. Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integr Circ Sig Process 94, 247–257 (2018). https://doi.org/10.1007/s10470-017-1069-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10470-017-1069-1