Advertisement

Journal of Signal Processing Systems

, Volume 82, Issue 2, pp 217–228 | Cite as

Exploration of Local Variability in Text-Independent Speaker Verification

  • Liping Chen
  • Kong Aik Lee
  • Bin Ma
  • Wu Guo
  • Haizhou Li
  • Li-Rong DaiEmail author
Article

Abstract

Total variability model has shown to be effective for text-independent speaker verification. It provisions a tractable way to estimate the so-called i-vector, which describes the speaker and session variability rendered in a whole utterance. In order to extract the local session variability that is neglected by an i-vector, local variability models were proposed, including the Gaussian- and the dimension-oriented local variability models. This paper presents a consolidated study of the total and local variability models and gives a full comparison between them under the same framework. Besides, new extensions are proposed for the existing local variability models. The comparison between the total variability model and the local variability models is fulfilled with the experiments on NIST SRE’08 and SRE’10 datasets. Furthermore, in the experiments, the dimension-oriented local variability models show their capability to capture the session variability which is complementary to that estimated by the total variability model.

Keywords

Speaker recognition Factor analysis Session variability 

Notes

Acknowledgments

The work of Liping Chen was partially supported by the National Nature Science Foundation of China (Grant No. 61273264) and the electronic information industry development fund of China (Grant No. 2013-472).

References

  1. 1.
    Reynolds, D.A., Quatieri, T.F., & Dumn, R.B. (2000). Speaker verification using adapted Gaussian mixture model. Digital Signal Processing, 10(1–3), 19–41.CrossRefGoogle Scholar
  2. 2.
    Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40.CrossRefGoogle Scholar
  3. 3.
    Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Speaker and session variability in GMM-Based speaker verification. IEEE Trans. Audio Speech and Language Processing, 15(4), 1448–1460.CrossRefGoogle Scholar
  4. 4.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech and Language Processing, 19(4), 788–798.CrossRefGoogle Scholar
  5. 5.
    Bishop, C.M. (2006). Pattern recognition and machine learning: Springer.Google Scholar
  6. 6.
    Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., & Dumouchel, P. (2013). PLDA for speaker verification with utterance of arbitrary duration. In: Proceedings of IEEE ICASSP, (pp. 7649–7653).Google Scholar
  7. 7.
    Hatch, A., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition. In: International conference on spoken language processing, Pittsburgh.Google Scholar
  8. 8.
    Prince, S.J.D., & Elder, J.H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of the international conference on computer vision.Google Scholar
  9. 9.
    Chen, L., Lee, K.A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Local variability modeling for text-independent speaker verification. In: Proceedings of Odyssey: Speaker and Language Recognition Workshop.Google Scholar
  10. 10.
    Chen, L., Lee, K.A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Local variability vector for text-independent speaker verification. In: Proceedings of ISCSLP, (pp. 54–58).Google Scholar
  11. 11.
    Kenny, P. (2012). A small footprint i-vector extractor. In: Proceedings of the Odyssey: speaker and language recognition workshop.Google Scholar
  12. 12.
    Matejka, P., Glembek, O., Castaldo, F., Alam, J., Plchot, O., Kenny, P., Burget, L., & Cernocky, J. (2011). Full-covariance ubm and heavy-tailed plda in i-vector speaker verification. In: Proceedings of the IEEE ICASSP, (pp. 4828–4831).Google Scholar
  13. 13.
    Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In: Proceedings of the Odyssey: speaker and language recognition workshop.Google Scholar
  14. 14.
    Prince, S.J.D. (2012). Computer vision: models, learning, and inference, Cambridge University Press.Google Scholar
  15. 15.
    Jiang, Y., Lee, K.A., Tang, Z., Ma, B., Larcher, A., & Li, H. (2012). PLDA modeling in i-vector and supervector space for speaker verification. In: Proceedings if the INTERSPEECH, paper 198.Google Scholar
  16. 16.
    Lee, K.A., Larcher, A., You, C.H., Ma, B., & Li, H. (2013). Multi-session PLDA scoring of i-vector for partially open-set speaker detection. In: Proceedings of the INTERSPEECH, (pp. 3651–3655).Google Scholar
  17. 17.
    Kenny, P., Stafylakis, T., Ouellet, P., Alam, J., & Dumouchel, P. (2013). PLDA for Speaker Verification with Utterances of Arbitrary Duration. In: Proceedings of the IEEE ICASSP, (pp. 7649–7653).Google Scholar
  18. 18.
    Chen, L., Lee, K. A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Minimum divergence estimation of speaker prior in multi-session PLDA scoring. In: Proceedings of the ICASSP, (pp. 4035–4036).Google Scholar
  19. 19.
    Brmmer, N., & du Preez, J. (2006). Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2), 230–275.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Liping Chen
    • 1
  • Kong Aik Lee
    • 2
  • Bin Ma
    • 2
  • Wu Guo
    • 1
  • Haizhou Li
    • 2
  • Li-Rong Dai
    • 1
    Email author
  1. 1.EEIS, USTCHefeiChina
  2. 2.SingaporeSingapore

Personalised recommendations