Usage of DNN in Speaker Recognition: Advantages and Problems

  • Oleg Kudashev
  • Sergey Novoselov
  • Timur Pekhovsky
  • Konstantin Simonchik
  • Galina Lavrentyeva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9719)

Abstract

In this paper we consider different approaches of artificial neural networks application for speaker recognition task. We investigated the performance of DNN application at different levels of speaker recognition system: i-vector extraction level and model Back-End level. Results of our study perform high efficiency of the proposed neural network based approaches for solving this problem. It is shown that the use of DNN technology at different levels increases the reliability of speaker recognition system independently. However, there are some disadvantages of such systems, which are also described in this paper.

Keywords

DNN Speaker recognition PLDA 

Notes

Acknowledgments

This work was partially financially supported by the Government of the Russian Federation, Grant 074-U01.

References

  1. 1.
    Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically aware deep neural network. In: 2014 IEEE International Conference on Acoustics, Speech, Signal Process, pp. 1695–1699 (2014)Google Scholar
  2. 2.
    Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J.: Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition. The Speaker and Language Recognition Workshop (2014). http://cs.uef.fi/odyssey2014/program/pdfs/28.pdf
  3. 3.
    Stafylakis, T., Kenny, P., Senoussaoui, M., Dumouchel, P.: PLDA using gaussian restricted Boltzmann machines with application to speaker recognition. In: 3th Annual Conference of the International Speech Communication Association, Portland, OR, USA, pp. 1692–1696 (2012)Google Scholar
  4. 4.
    Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V., Prudnikov, A.: Non-linear PLDA for i-vector speaker verification. In: 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, pp. 214–218 (2015)Google Scholar
  5. 5.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process 29(6), 82–97 (2012)CrossRefGoogle Scholar
  7. 7.
    McLaren., M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Press (2015)Google Scholar
  8. 8.
    Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–8 (2007)Google Scholar
  9. 9.
    Novoselov, S., Pekhovsky, T., Simonchik, K.: STC Speaker Recognition System for the NIST i-Vector Challenge. The Speaker and Language Recognition Workshop. http://cs.uef.fi/odyssey2014/program/pdfs/25.pdf
  10. 10.
    Daniel, G.R., Carol, Y.E.W.: Analysis of i-vector length normalization in speaker recognition systems. In: 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp. 249–252 (2011)Google Scholar
  11. 11.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust Features with denoising autoencoders. In: 25th International Conference on Machine Learning, Helsinki, Finland (2008)Google Scholar
  12. 12.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlıcek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE Automatic Speech Recognition and Understanding Workshop (2011)Google Scholar
  13. 13.
    Kenny, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)MathSciNetCrossRefGoogle Scholar
  14. 14.
    The NIST Year 2012 Speaker Recognition Evaluation Plan. http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Oleg Kudashev
    • 1
  • Sergey Novoselov
    • 1
  • Timur Pekhovsky
    • 1
    • 2
  • Konstantin Simonchik
    • 2
  • Galina Lavrentyeva
    • 1
    • 2
  1. 1.Speech Technology CenterSt. PeterburgRussia
  2. 2.ITMO UniversitySt. PetersburgRussia

Personalised recommendations