On the use of Total Variability and Probabilistic Linear Discriminant Analysis for Speaker Verification on Short Utterances

  • Javier González Domínguez
  • Rubén Zazo
  • Joaquin González-Rodríguez
Part of the Communications in Computer and Information Science book series (CCIS, volume 328)

Abstract

This paper explores the use of state-of-the-art acoustic systems, namely Total Variability and Probabilistic Linear Discriminant Analysis for speaker verification on short utterances. While the recent advances in the field dealing with the session variability problem have proved to greatly outperform speaker verification systems on typical scenarios where a reasonable amount of speech is available, this performance rapidly degrades at the presence of limited data in both enrolment and verification stages. This paper studies the behaviour of TV and PLDA on those scenarios where a scarce amount of speech (~10s) is available to train and testing a speaker identity. The analysis has been carried out on the well defined and standard 10s-10s task belonging to the NIST Speaker Recognition Evaluation 2010 (NIST SRE10) and it explores the multiple parameters, which define TV and PLDA in order to give some insight about their relevance in this specific scenario.

Keywords

i-vectors Total variability PLDA short utterances 

References

  1. 1.
    Kenny, P., Boulianne, G., Oullet, P., Dumouchel, P.: Speaker and Session Variability in GMM-Based Speaker Verification. IEEE Trans. on Audio, Speech and Language Processing 15(4), 1448–1460 (2007)CrossRefGoogle Scholar
  2. 2.
    Vogt, R., Sridharan, S.: Explicit Modeling of Session Variability for Speaker Verification. Computer Speech & Language 22(1), 17–38 (2008)CrossRefGoogle Scholar
  3. 3.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis for Speaker Verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4), 788–798 (2011)CrossRefGoogle Scholar
  4. 4.
    Kenny, P.: Bayesian Speaker Verification with Heavy-Tailed Priors. In: Odyssey: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28-July 1 (2010)Google Scholar
  5. 5.
    Scheffer, N., Ferrer, L., Graciarena, M., Kajarekar, S.S., Shriberg, E., Stolcke, A.: The SRI NIST 2010 Speaker Recognition Evaluation System. In: ICASSP, pp. 5292–5295 (2011)Google Scholar
  6. 6.
    Vogt, R., Baker, B., Sridharan, S.: Factor analysis subspace estimation for speaker verification with short utterances. In: INTERSPEECH, pp. 853–856 (2008)Google Scholar
  7. 7.
    Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-Vector Based Speaker Recognition on Short Utterances. In: Interspeech 2011, pp. 2341–2344. International Speech Communication Association (ISCA), Firenze Fiera (2011), http://eprints.qut.edu.au/46313/ Google Scholar
  8. 8.
    Hatch, A.O., Kajarekar, S.S., Stolcke, A.: Within-class covariance normalization for svm-based speaker recognition. In: INTERSPEECH (2006)Google Scholar
  9. 9.
    Prince, S., Li, P., Fu, Y., Mohammed, U., Elder, J.H.: Probabilistic models for inference about identity. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 144–157 (2012), http://dblp.uni-trier.de/db/journals/pami/pami34.html#PrinceLFME12 CrossRefGoogle Scholar
  10. 10.
    Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of I-Vector Length Normalization in Speaker Recognition Systems. In: INTERSPEECH, pp. 249–252 (2011)Google Scholar
  11. 11.
    National Institute of Standards and a. o. Technology, The NIST Year 2010 Speaker Recognition Evaluation Plan (2010), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE10_evalplanr6.pdf
  12. 12.
    Shum, S., Dehak, N., Dehak, R., Glass, J.R.: Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification. In: Odyssey: The Speaker and Language Recognition Workshop, Brno, Czech Republic (2010)Google Scholar
  13. 13.
    Matejka, P., Glembek, O., Castaldo, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Cernocký, J.: Full-Covariance UBM and Heavy-Tailed PLDA in I-Vector Speaker Verification. In: ICASSP, pp. 4828–4831. IEEE (2011), http://dblp.uni-trier.de/db/conf/icassp/icassp2011.html#MatejkaGCAPKBC11
  14. 14.
    Zhao, X., Dong, Y.: Variational bayesian joint factor analysis models for speaker verification. IEEE Transactions on Audio, Speech & Language Processing 20(3), 1032–1042 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Javier González Domínguez
    • 1
  • Rubén Zazo
    • 1
  • Joaquin González-Rodríguez
    • 1
  1. 1.Biometric Recognition Group (ATVS)Escuela Politecnica Superior, Universidad Autonoma de MadridSpain

Personalised recommendations