Abstract
Prioritized grid long short-term memory (pGLSTM) has been shown to improve automatic speech recognition efficiently. In this paper, we implement this state-of-the-art model of ASR tasks for text-independent Chinese language speaker verification tasks in which DNN/i-Vector (DNN-based i-Vector) framework is adopted along with PLDA backend. To fully explore the performance, we compared the presented pGLSTM based UBM to GMM-UBM and HLSTM-UBM. Due to constraint of the amount of Chinese transcribed corpus for ASR training, we also explore an adaptation method by firstly training the pGLSTM-UBM on English language with large amount of corpus and use a PLDA adaptation backend to fit into Chinese language before the final speaker verification scoring. Experiments show that both pGLSTM-UBM model with corresponding PLDA backend and pGLSTM-UBM with adapted PLDA backend achieve better performance than the traditional GMM-UBM model. Additionally the pGLSTM-UBM with PLDA backend achieves performance of 4.94% EER in 5 s short utterance and 1.97% EER in 10 s short utterance, achieving 47% and 51% drop comparing to that of GMM. Experiment results imply that DNN from ASR tasks can expand the advantage of UBM model especially in short utterance and that better DNN model for ASR tasks could achieve extra gain in speaker verification tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. ASLP 19, 788–798 (2010)
Burget, L., Plchot, O., Cumani, S.: Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4832–4835 (2011)
McLaren, M., Castan, D., Ferrer, L., Lawson, A.: On the issue of calibration in DNN-based speaker recognition systems. In: INTERSPEECH, pp. 1825–1829 (2016)
Lei, Y., Scheffer, N., Ferrer, L.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1695–1699 (2014)
Snyder, D., Garcia-Romero, D., Povey, D.: Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 92–97 (2016)
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Sig. Process. Lett. 22(10), 1671–1675 (2015)
Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: 8th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 301–305 (2012)
Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNS for distant speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5755–5759 (2016)
Hsu, W., Zhang, Y., Jim, G.: A prioritized grid long short-term memory RNN for speech recognition. In: Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016
Garcia-Romero D., Zhang X., Mccree A.: Improving speaker recognition performance in the domain adaptation challenge using deep neural networks. In: Spoken Language Technology Workshop. IEEE (2015)
Nal, K., Ivo, D., Alex, G.: Grid long short-term memory. arXiv preprint arXiv:1507.01526 (2015)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)
Dong, Y., Adam, E., Mike, S., Kaisheng, Y., Zhiheng, H., Brian, G., Oleksii, K., Yu, Z., Frank, S., Huaming, W.: An introduction to computational networks and the computational network toolkit. Technical report, MSR, Microsoft Research (2014)
Taufiq, H., Rahim, S., John, H.L.H., David, V.L.: Duration mismatch compensation for i-vector based speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2013)
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119 (2016)
David, S., Pegah, G., Daniel, P., Daniel, G., Yishay, C., Sanjeev, K.: Deep neural network-based speaker embedding for end-to-end speaker verification. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 165–170 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, J., Guo, H., Xiao, J. (2017). Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification. In: Zhou, J., et al. Biometric Recognition. CCBR 2017. Lecture Notes in Computer Science(), vol 10568. Springer, Cham. https://doi.org/10.1007/978-3-319-69923-3_63
Download citation
DOI: https://doi.org/10.1007/978-3-319-69923-3_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69922-6
Online ISBN: 978-3-319-69923-3
eBook Packages: Computer ScienceComputer Science (R0)