Skip to main content
Log in

Modified group delay feature based total variability space modelling for speaker recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, modified group delay (MODGD) features are used to model target speakers in the Total Variability Space (TVS) framework for speaker recognition. MODGD based features have been shown to improve speaker recognition performance owing to the ability of group delay functions to emphasise formants. The basis vectors of TVS are estimated using the PPCA algorithm while i-vectors for a speaker are extracted using the conventional technique. The estimation of the total variability space is simplified by a simple transformation of the supervectors. This results in a significant speed up in the estimation of hyperparameters of TVS as the computational complexity of PPCA algorithm is simpler compared to that of the conventaional procedure. This is important as the estimation procedure needs to handle large amounts data for estimation. The technique has already been shown to provide a speed up of 16\(\times \). The performance of the MODGD-based system is compared with that of the MFCC based system on the NIST SRE 2010 benchmark dataset. Two types of fusions are tested in this work—systems fused at the i-vector level and at the score level. A considerable performance improvement is observed in terms of the EER (Equal Error Rate) by employing these fusion techniques. A robust speaker recognition system with decreased development time is obtained as a result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Auckenthaler, R., Carey, M., & Lloyd-Thomas, H. (2000). Score normalization for text-independent speaker verification systems. Digital Signal Processing, 10, 42–54.

    Article  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition. IEEE Transaction Acoustics, Speech, Signal Processing, 28, 357–366.

    Article  Google Scholar 

  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transaction on Audio, Speech and Language Processing, 19(4), 788–798.

    Article  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2007). Pattern classification. India: Wiley.

    Google Scholar 

  • Garcia-Romero, D., & Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems. In Proceedings of Interspeech, Florence. Florence, Italy, pp. 249–252

  • Glembek, O., & al. et. (2011). Simplification and optimization of i-vector extraction. In Proceedings of ICASSP.

  • Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of joint features derived from the modified group delay function in speech processing. EURASIP, 2007.

  • Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Proceedings of Odyssey 2010—The Speaker and Language Recognition Workshop.

  • Kinnunen, T., & Li, B. H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.

    Article  Google Scholar 

  • Krishnan, P. S. H., Padmanabhan, R., & Murthy, H. A. (2011). Robustness of group delay representations for noisy speech signals. Internation Journal of Speech Technology, 14(4), 361–368.

    Article  Google Scholar 

  • Madikeri, S., & Murthy, H. A. (2011). Mel filter bank energy-based slope feature and its application to speaker recognition. In Proceedings of National Conference on Communication. pp. 1–4.

  • Madikeri, S. R. (2012). A hybrid factor analysis and probabilistic pca-based system for dictionary learning and encoding for robust speaker recognition. Odyssey Workshop.

  • Morrison, G. S., Rose, P., & Kinoshita, Y. (2008). Extraction of likelihood-ratio forensic evidence from the formant trajectories of diphthongs. Journal of the Acoustical Society of America, 123(5), 3877–3877.

    Article  Google Scholar 

  • Murthy, H. A., Beaufays, F., Heck, L. P., & Weintraub, M. (1999). Robust text-independent speaker identification over telephone channels. IEEE Transaction Speech and Audio Processing, 7(5), 554–568.

    Article  Google Scholar 

  • Murthy, H. A., & Yegnanarayana, B. (1991a). Formant extraction from minimum phase group delay function. Speech Communications, 10, 209–221.

    Article  Google Scholar 

  • Murthy, H. A., & Yegnanarayana, B. (1991b). Speech processing using group delay functions. Signal Processing, 22(3), 259–267.

    Article  Google Scholar 

  • Murthy, H. A., & Yegnanarayana, B. (2011). Group delay functions and its application to speech processing. Sadhana, 36(5), 745–782.

    Article  Google Scholar 

  • NIST-2010. (n.d.).

  • Padmanabhan, R., Krishnan, P. S. H., & Murthy, H. A. (2009). Robustness of phase-based features for speaker recognition. Proceedings of Interspeech, pp. 2355–2358.

  • Padmanabhan, R., & Murthy, H. A. (2010). Acoustic feature diversity and speaker verification. in Proceedings of Interspeech, pp. 2110–2113.

  • Pelecanos, J., & Sridharan, S. (2001). Feature warping for robust speaker verification. In Proceedings of Speaker Odyssey.

  • Prince, S. J., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. pp. 1–8.

  • Rose, P., & Winter, E. (2010). Traditional forensic voice comparison with female formants: Gaussian mixture model and multivariate likelihood ratio analyses. In Proceedings of the 13th Australasian International Conference on Speech Science and Technology, pp. 42–45.

  • Rubin, D., & Thayer, D. (1982). Em algorithms for ml factor analysis. Psychometrika, 47(1), 69–76.

    Article  MATH  MathSciNet  Google Scholar 

  • Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. Jasa, 51(46), 2044–2056.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Srikanth R. Madikeri.

Additional information

This work was done when the first author was a Ph.D. candidate at IIT Madras.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madikeri, S.R., Talambedu, A. & Murthy, H.A. Modified group delay feature based total variability space modelling for speaker recognition. Int J Speech Technol 18, 17–23 (2015). https://doi.org/10.1007/s10772-014-9243-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9243-7

Keywords

Navigation