Approaches for Out-of-Domain Adaptation to Improve Speaker Recognition Performance

  • Andrey Shulipa
  • Sergey Novoselov
  • Aleksandr Melnikov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)

Abstract

In last years satisfactory performance of speaker recognition (SR) systems have been achieved in evaluations provided by NIST. It was possible due to using large datasets to train system parameters and accurate speaker variability modeling. In such a cases test and train conditions are similar and it ensures good performance for the evaluations. However in practical applications when training and testing conditions are different the problem of mismatching of the optimal SR system parameters occurs. It is the main problem in the deployment of the real application systems. It leads to reducing SR systems effectiveness. This paper investigates discriminative and generative approaches for the adaptation of the parameters of the speaker recognition systems and proposes effective solutions to improve their performance.

Keywords

Speaker recognition Domain adaptation Mismatch conditions 

Notes

Acknowledgments

This work was partially financially supported by the Government of the Russian Federation, Grant 074-U01.

References

  1. 1.
  2. 2.
    The lingusitic data consortium (ldc) catalog. http://catalog.ldc.upenn.edu
  3. 3.
    Brümmer, N., de Villiers, E.: The bosaris toolkit: theory, algorithms and code for surviving the new dcf. arXiv preprint (2013). arXiv:1304.2865
  4. 4.
    Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio, Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  5. 5.
    Doddington, G.R., Przybocki, M.A., Martin, A.F., Reynolds, D.A.: The NIST speaker recognition evaluation-overview, methodology, systems, results, perspective. Speech Commun. 31(2), 225–254 (2000)CrossRefGoogle Scholar
  6. 6.
    Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Interspeech, pp. 249–252 (2011)Google Scholar
  7. 7.
    Garcia-Romero, D., McCree, A.: Supervised domain adaptation for i-vector based speaker recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4047–4051. IEEE (2014)Google Scholar
  8. 8.
    Garcia-Romero, D., McCree, A., Shum, S., Brummer, N., Vaquero, C.: Unsupervised domain adaptation for i-vector speaker recognition. In: Proceedings of Odyssey: The Speaker and Language Recognition Workshop (2014)Google Scholar
  9. 9.
    Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Odyssey, p. 14 (2010)Google Scholar
  10. 10.
    Novoselov, S., Pekhovsky, T., Simonchik, K.: STC speaker recognition system for the NIST i-vector challenge. In: Odyssey: The Speaker and Language Recognition Workshop, pp. 231–240 (2014)Google Scholar
  11. 11.
    Novoselov, S., Pekhovsky, T., Simonchik, K., Shulipa, A.: RBM-PLDA subsystem for the NIST i-vector challenge. System 8, 9 (2014)Google Scholar
  12. 12.
    Pekhovsky, T., Novoselov, S., Sholohov, A., Kudashev, O.: On autoencoders in the i-vector space for speaker recognitionGoogle Scholar
  13. 13.
    Rohdin, J., Biswas, S., Shinoda, K.: Discriminative PLDA training with application-specific loss functions for speaker verification. In: Odyssey, The Speaker and Language Recognition Workshop (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Andrey Shulipa
    • 1
  • Sergey Novoselov
    • 1
    • 2
  • Aleksandr Melnikov
    • 2
  1. 1.ITMO UniversitySaint PetersburgRussia
  2. 2.Speech Technology CenterSaint PetersburgRussia

Personalised recommendations