Usage of DNN in Speaker Recognition: Advantages and Problems
In this paper we consider different approaches of artificial neural networks application for speaker recognition task. We investigated the performance of DNN application at different levels of speaker recognition system: i-vector extraction level and model Back-End level. Results of our study perform high efficiency of the proposed neural network based approaches for solving this problem. It is shown that the use of DNN technology at different levels increases the reliability of speaker recognition system independently. However, there are some disadvantages of such systems, which are also described in this paper.
KeywordsDNN Speaker recognition PLDA
This work was partially financially supported by the Government of the Russian Federation, Grant 074-U01.
- 1.Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically aware deep neural network. In: 2014 IEEE International Conference on Acoustics, Speech, Signal Process, pp. 1695–1699 (2014)Google Scholar
- 2.Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J.: Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition. The Speaker and Language Recognition Workshop (2014). http://cs.uef.fi/odyssey2014/program/pdfs/28.pdf
- 3.Stafylakis, T., Kenny, P., Senoussaoui, M., Dumouchel, P.: PLDA using gaussian restricted Boltzmann machines with application to speaker recognition. In: 3th Annual Conference of the International Speech Communication Association, Portland, OR, USA, pp. 1692–1696 (2012)Google Scholar
- 4.Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V., Prudnikov, A.: Non-linear PLDA for i-vector speaker verification. In: 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, pp. 214–218 (2015)Google Scholar
- 7.McLaren., M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Press (2015)Google Scholar
- 8.Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, pp. 1–8 (2007)Google Scholar
- 9.Novoselov, S., Pekhovsky, T., Simonchik, K.: STC Speaker Recognition System for the NIST i-Vector Challenge. The Speaker and Language Recognition Workshop. http://cs.uef.fi/odyssey2014/program/pdfs/25.pdf
- 10.Daniel, G.R., Carol, Y.E.W.: Analysis of i-vector length normalization in speaker recognition systems. In: 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp. 249–252 (2011)Google Scholar
- 11.Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust Features with denoising autoencoders. In: 25th International Conference on Machine Learning, Helsinki, Finland (2008)Google Scholar
- 12.Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlıcek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE Automatic Speech Recognition and Understanding Workshop (2011)Google Scholar
- 14.The NIST Year 2012 Speaker Recognition Evaluation Plan. http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf