Advertisement

Unsupervised Training of PLDA with Variational Bayes

  • Jesús Villalba
  • Eduardo Lleida
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

Speaker recognition relays on models that need a large amount of labeled development data. This models are successful in tasks like NIST SRE where sufficient data is available. However, in real applications, we usually do not have so much data and the speaker labels are unknown. We used a variational Bayes procedure to train PLDA on unlabeled data. The method consisted in a generative model where both the unknown labels and the model parameters are latent variables. We experimented on unlabeled NIST SRE data. The trained models were evaluated on NIST SRE10. Compared to cosine distance, unsupervised PLDA improved EER by 28% and minimum DCF by 36%.

Keywords

speaker recognition PLDA unsupervised training variational Bayes AHC 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis For Speaker Verification. IEEE Transactions on Audio, Speech and Language Processing 19(4), 788–798 (2011)CrossRefGoogle Scholar
  2. 2.
    Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of I-vector Length Normalization in Speaker Recognition Systems. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, Interspeech 2011, Florence, Italy, pp. 249–252. ISCA (August 2011)Google Scholar
  3. 3.
    Vaquero, C.: Dataset Shift in PLDA based Speaker Verification. In: Proceedings of Odyssey 2012 - The Speaker and Language Recognition Workshop, Singapore, pp. 39–46. COLIPS (June 2012)Google Scholar
  4. 4.
    Villalba, J., Brummer, N.: Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, Interspeech 2011, Florence, Italy, pp. 505–508. ISCA (August 2011)Google Scholar
  5. 5.
    Villalba, J., Brummer, N., Lleida, E.: Fully Bayesian Likelihood Ratios vs i-vector Length Normalization in Speaker Recognition Systems. In: NIST SRE 2011 Speaker Recognition Workshop, Atlanta, Georgia, USA (December 2011)Google Scholar
  6. 6.
    Villalba, J., Lleida, E.: Bayesian Adaptation of PLDA Based Speaker Recognition to Domains with Scarce Development Data. In: Proceedings of Odyssey 2012 - The Speaker and Language Recognition Workshop, Singapore. COLIPS (June 2012)Google Scholar
  7. 7.
    Garcia-Romero, D., McCree, A.: Supervised Domain Adaptation for I-Vector Based Speaker Recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, pp. 4075–4079. IEEE (May 2014)Google Scholar
  8. 8.
    Villalba, J., Lleida, E.: Unsupervised Adaptation of PLDA by Using Variational Bayes Methods. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy. IEEE (May 2014)Google Scholar
  9. 9.
    Garcia-Romero, D., McCree, A., Shum, S., Brummer, N., Vaquero, C.: Unsupervised Domain Adaptation for I-Vector Speaker Recognition. In: Proceedings of Odyssey 2014 - The Speaker and Language Recognition Workshop, Joensuu, Finland, pp. 260–264. ISCA (June 2014)Google Scholar
  10. 10.
    Brummer, N., Garcia-Romero, D.: Generative Modelling for Unsupervised Score Calibration. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, pp. 1699–1703. IEEE (May 2014)Google Scholar
  11. 11.
    Shum, S., Reynolds, D.A., Garcia-Romero, D., McCree, A.: Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems. In: Proceedings of Odyssey 2014 - The Speaker and Language Recognition Workshop, Joensuu, Finland, pp. 265–272. ISCA (June 2014)Google Scholar
  12. 12.
    Aronowitz, H.: Inter Dataset Variability Compensation for Speaker Recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, pp. 4030–4034. IEEE (May 2014)Google Scholar
  13. 13.
    Glembek, O., Ma, J., Matejka, P., Zhang, B., Plchot, O., Burget, L., Matsoukas, S.: Domain Adaptation Via Within-Class Covariance Correction in I-Vector Based Speaker Recognition Systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, pp. 4060–4064. IEEE (May 2014)Google Scholar
  14. 14.
    Bishop, C.: Variational principal components. In: Proceedings of the 9th International Conference on Artificial Neural Networks, ICANN 1999, Edinburgh, Scotland, pp. 509–514. IET (September 1999)Google Scholar
  15. 15.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC (2006)Google Scholar
  16. 16.
    Katahira, K., Watanabe, K., Okada, M.: Deterministic annealing variant of variational Bayes method. Journal of Physics: Conference Series International Workshop on Statistical-Mechanical Informatics (IW-SMI 2007), 95 (January 2008)Google Scholar
  17. 17.
    Villalba, J.: Unsupervised Adaptation of SPLDA. Technical report, University of Zaragoza, Zaragoza, Spain (2013)Google Scholar
  18. 18.
    Anderberg, M.R.: Cluster Analysis for Applications. Academic Press (1973)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jesús Villalba
    • 1
  • Eduardo Lleida
    • 1
  1. 1.ViVoLab, Aragon Institute for Engineering Research (I3A)University of ZaragozaSpain

Personalised recommendations