Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Pang, Xiaomin; Mak, Man-Wai

doi:10.1007/s10772-015-9310-8

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Published: 12 October 2015

Volume 18, pages 633–648, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Xiaomin Pang¹ &
Man-Wai Mak¹

176 Accesses
1 Citation
Explore all metrics

Abstract

While i-vectors with probabilistic linear discriminant analysis (PLDA) can achieve state-of-the-art performance in speaker verification, the mismatch caused by acoustic noise remains a key factor affecting system performance. In this paper, a fusion system that combines a multi-condition signal-to-noise ratio (SNR)-independent PLDA model and a mixture of SNR-dependent PLDA models is proposed to make speaker verification systems more noise robust. First, the whole range of SNR that a verification system is expected to operate is divided into several narrow ranges. Then, a set of SNR-dependent PLDA models, one for each narrow SNR range, are trained. During verification, the SNR of the test utterance is used to determine which of the SNR-dependent PLDA models is used for scoring. To further enhance performance, the SNR-dependent and SNR-independent models are fused using linear and logistic regression fusion. The performance of the fusion system and the SNR-dependent system is evaluated on the NIST 2012 speaker recognition evaluation for both noisy and clean conditions. Results show that a mixture of SNR-dependent PLDA models perform better in both clean and noisy conditions. It was also found that the fusion system is more robust than the conventional i-vector/PLDA systems under noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Article Open access 02 December 2017

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

Article 29 October 2018

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

Article 25 March 2021

Notes

http://dnt.kr.hsnr.de/download.html.

References

Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer.
MATH Google Scholar
Brümmer, N. (2014). FoCal. https://www.sitesgooglecom/site/nikobrummer/focal.
Brümmer, N., & de Villiers, E. (2011). The Bosaris toolkit user guide: Theory, algorithms and code for binary classifier score processing. Documentation of Bosaris toolkit. https://sites.google.com/site/bosaristoolkit/
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., et al. (2011). Promoting robustness for speaker modeling in the community: The PRISM evaluation set. In Proceedings of NIST 2011 workshop.
Garcia-Romero, D., & Espy-Wilson, C. (2011). Analysis of i-vector length normalization in speaker recognition systems. In Proceedings of interspeech (pp. 249–252).
Garcia-Romero, D., Zhou, X., & Espy-Wilson, C. (2012). Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 4257–4260).
Hasan, T., & Hansen, J. (2013). Acoustic factor analysis for robust speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 842–853.
Article Google Scholar
Hasan, T., & Hansen, J. (2014). Maximum likelihood acoustic factor analysis models for robust speaker verification in noise. IEEE Transactions on Audio, Speech, and Language Processing, 22(2), 381–391.
Article Google Scholar
Hasan, T., Sadjadi, S. O., Liu, G., Shokouhi, N., Boril, H., & Hansen, J. H. L. (2013). CRSS system for 2012 NIST speaker recognition evaluation. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 6783–6787).
Hatch, A., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition. In Proceedings of the 9th international conference on spoken language processing, Pittsburgh, PA, USA (pp. 1471–1474).
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In Proceedings of Odyssey. 2010 Speaker and language recognition workshop. Brno: Czech Republic.
Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 15(4), 1435–1447.
Article Google Scholar
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., & Dumouchel, P. (2008). A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 16(5), 980–988.
Article Google Scholar
Leeuwen, D. A., & Saeidi, R. (2013). Knowing the non-target speakers: The effect of the i-vector population for PLDA training in speaker recognition. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), Vancouver, BC, Canada (pp. 6778–6782).
Lei, Y., Burget, L., Ferrer, L., Graciarena, M., & Scheffer, N. (2012). Towards noise-robust speaker recognition using probabilistic linear discriminant analysis. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), Kyoto, Japan (pp. 4253–4256).
Lei, Y., Burget, L., & Scheffer, N. (2013). A noise robust i-vector extractor using vector Taylor series for speaker recognition. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 6788–6791).
Lei, Y., Mclaren, M., Ferrer, L., & Scheffer, N. (2014). Simplified VTS-based i-vector extraction in noise-robust speaker recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 4065–4069).
Li, Q., & Huang, Y. (2010). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 4514–4517).
Mak, M. W., & Yu, H. B. (2013). A study of voice activity detection techniques for NIST speaker recognition evaluations. Computer, Speech and Language, 28(1), 295–313.
Article Google Scholar
Mallidi, S., Ganapathy, S., & Hermansky, H. (2013). Robust speaker recognition using spectro-temporal autoregressive models. In Proceedings of interspeech.
Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proceedings of Eurospeech’97 (pp. 1895–1898).
Martinez, D., Burget, L., Stafylakis, T., Lei, Y., Kenny, P., & Lleida, E. (2014). Unscented transform for i-vector-based noisy speaker recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 4070–4074).
McLaren, M., Mandasari, M., & Leeuwen, D. (2012). Source normalization for language-independent speaker recognition using i-vectors. In Proceedings of Odyssey 2012: The speaker and language recognition workshop (pp. 55–61).
McLaren, M., Scheffer, N., Graciarena, M., Ferrer, L., & Lei, Y. (2013). Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 6773–6777).
Ming, J., Hazen, T., Glass, J., & Reynolds, D. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech and Language Processing, 15(5), 1711–1723.
Article Google Scholar
Neto, S. F. D. C. (1999). The ITU-T software tool library. International Journal of Speech Technology, 2(4), 259–272.
Article Google Scholar
NIST. (2012). The NIST year 2012 speaker recognition evaluation plan. http://www.nistgov/itl/iad/mig/sre12cfm.
Pang, X. M., & Mak, M. W. (2014). Fusion of SNR-dependent PLDA models for noise robust speaker verification. In ISCSLP’2014 (pp. 619–623).
Pelecanos, J., & Sridharan, S. (2001). Feature warping for robust speaker verification. In Proceedings of Odyssey, 2001. The speaker and language recognition workshop, Crete, Greece (pp. 213–218).
Prince, S., & Elder, J. (2007). Probabilistic linear discriminant analysis for inferences about identity. In IEEE 11th international conference on computer vision, 2007 (ICCV 2007, pp. 1–8).
Rajan, P., Afanasyev, A., Hautamäki, V., & Kinnunen, T. (2014). From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification. Digital Signal Processing Online. doi:10.1016/j.dsp.2014.05.001.
Rajan, P., Kinnunen, T., & Hautamäki, V. (2013). Effect of multicondition training on i-vector PLDA configurations for speaker recognition. In Proceedings of interspeech (pp. 3694–3697).
Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing, 21(5), 1012–1022.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Article Google Scholar
Sadjadi, S. O., Hasan, T., & Hansen, J. (2012). Mean Hilbert envelope coefficients (MHEC) for robust speaker recognition. In Proceedings of interspeech (pp. 1696–1699).
Sadjadi, S., Pelecanos, J., & Zhu, W. (2014). Nearest neighbor discriminant analysis for robust speaker recognition. In Proceedings of interspeech (pp. 1860–1864).
Saeidi, R., & van Leeuwen, D. A. (2012). The Radboud University Nijmegen submission to NIST SRE-2012. In Proceedings of the NIST speaker recognition evaluation workshop.
Shao, Y., & Wang, D. (2008). Robust speaker identification using auditory features and computational auditory scene analysis. In 2008 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 1589–1592).
Yu, C., Liu, G., Hahm, S., & Hansen, J. (2014). Uncertainty propagation in front end factor analysis for noise robust speaker recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp. 4045–4049).
Yu, H., & Mak, M. (2011). Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation. In Proceedings of interspeech (pp. 2353–2356).

Download references

Acknowledgments

This work was in part supported by The Hong Kong Research Grant Council (Grant Nos. PolyU 152117/14E and PolyU 152068/15E) and The Hong Kong Polytechnic University (Grant No. 4-ZZCX).

Author information

Authors and Affiliations

Centre for Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Xiaomin Pang & Man-Wai Mak

Authors

Xiaomin Pang
View author publications
You can also search for this author in PubMed Google Scholar
Man-Wai Mak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Man-Wai Mak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pang, X., Mak, MW. Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA. Int J Speech Technol 18, 633–648 (2015). https://doi.org/10.1007/s10772-015-9310-8

Download citation

Received: 23 May 2015
Accepted: 29 September 2015
Published: 12 October 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10772-015-9310-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Abstract

Access this article

Similar content being viewed by others

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Abstract

Access this article

Similar content being viewed by others

Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation