Abstract
This paper presents a novel approach of estimating the confidence interval of speaker verification scores. This approach is utilised to minimise the utterance lengths required in order to produce a confident verification decision. The confidence estimation method is also extended to address both the problem of high correlation in consecutive frame scores, and robustness with very limited training samples. The proposed technique achieves a drastic reduction in the typical data requirements for producing confident decisions in an automatic speaker verification system. When evaluated on the NIST 2005 SRE, the early verification decision method demonstrates that an average of 5–10 seconds of speech is sufficient to produce verification rates approaching those achieved previously using an average in excess of 100 seconds of speech.
Chapter PDF
Similar content being viewed by others
References
Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M., Ortega-Garcia, J.: Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech & Language 20(2-3), 331–355 (2006)
Campbell, W.M., Brady, K.J., Campbell, J.P., Granville, R., Reynolds, D.A.: Understanding scores in forensic speaker recognition. In: Odyssey: The Speaker and Language Recognition Workshop (2006)
Brümmer, N., du Preez, J.: Application-independent evaluation of speaker detection. Computer Speech & Language 20(2-3), 230–275 (2006)
Vogt, R., Sridharan, S., Mason, M.: Making confident speaker verification decisions with minimal speech. In: Interspeech, pp. 1405–1408 (2008)
Vogt, R., Sridharan, S.: Explicit modelling of session variability for speaker verification. Computer Speech & Language 22(1), 17–38 (2008)
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1/2/3), 42–54 (2000)
Martin, A., Miller, D., Przybocki, M., Campbell, J., Nakasone, H.: Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004. In: International Conference on Language Resources and Evaluation, pp. 587–590 (2004)
Martin, A., Przybocki, M.: The NIST 1999 speaker recognition evaluation—an overview. Digital Signal Processing 10(1-3), 1–18 (2000)
Gauvain, J.L., Lee, C.H.: Bayesian adaptive learning and MAP estimation of HMM. In: Lee, C.H., Soong, F., Paliwal, K. (eds.) Automatic Speech and Speaker Recognition: Advanced Topics, pp. 83–107. Kluwer Academic, Boston (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vogt, R., Sridharan, S. (2009). Minimising Speaker Verification Utterance Length through Confidence Based Early Verification Decisions. In: Tistarelli, M., Nixon, M.S. (eds) Advances in Biometrics. ICB 2009. Lecture Notes in Computer Science, vol 5558. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01793-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-01793-3_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01792-6
Online ISBN: 978-3-642-01793-3
eBook Packages: Computer ScienceComputer Science (R0)