Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification

Wang, Jianzong; Guo, Hui; Xiao, Jing

doi:10.1007/978-3-319-69923-3_63

Jianzong Wang²³,
Hui Guo²³ &
Jing Xiao²³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10568))

Included in the following conference series:

Chinese Conference on Biometric Recognition

3594 Accesses

Abstract

Prioritized grid long short-term memory (pGLSTM) has been shown to improve automatic speech recognition efficiently. In this paper, we implement this state-of-the-art model of ASR tasks for text-independent Chinese language speaker verification tasks in which DNN/i-Vector (DNN-based i-Vector) framework is adopted along with PLDA backend. To fully explore the performance, we compared the presented pGLSTM based UBM to GMM-UBM and HLSTM-UBM. Due to constraint of the amount of Chinese transcribed corpus for ASR training, we also explore an adaptation method by firstly training the pGLSTM-UBM on English language with large amount of corpus and use a PLDA adaptation backend to fit into Chinese language before the final speaker verification scoring. Experiments show that both pGLSTM-UBM model with corresponding PLDA backend and pGLSTM-UBM with adapted PLDA backend achieve better performance than the traditional GMM-UBM model. Additionally the pGLSTM-UBM with PLDA backend achieves performance of 4.94% EER in 5 s short utterance and 1.97% EER in 10 s short utterance, achieving 47% and 51% drop comparing to that of GMM. Experiment results imply that DNN from ASR tasks can expand the advantage of UBM model especially in short utterance and that better DNN model for ASR tasks could achieve extra gain in speaker verification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. ASLP 19, 788–798 (2010)
Google Scholar
Burget, L., Plchot, O., Cumani, S.: Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4832–4835 (2011)
Google Scholar
McLaren, M., Castan, D., Ferrer, L., Lawson, A.: On the issue of calibration in DNN-based speaker recognition systems. In: INTERSPEECH, pp. 1825–1829 (2016)
Google Scholar
Lei, Y., Scheffer, N., Ferrer, L.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1695–1699 (2014)
Google Scholar
Snyder, D., Garcia-Romero, D., Povey, D.: Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 92–97 (2016)
Google Scholar
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Sig. Process. Lett. 22(10), 1671–1675 (2015)
Article Google Scholar
Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: 8th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 301–305 (2012)
Google Scholar
Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNS for distant speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5755–5759 (2016)
Google Scholar
Hsu, W., Zhang, Y., Jim, G.: A prioritized grid long short-term memory RNN for speech recognition. In: Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016
Google Scholar
Garcia-Romero D., Zhang X., Mccree A.: Improving speaker recognition performance in the domain adaptation challenge using deep neural networks. In: Spoken Language Technology Workshop. IEEE (2015)
Google Scholar
Nal, K., Ivo, D., Alex, G.: Grid long short-term memory. arXiv preprint arXiv:1507.01526 (2015)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)
Google Scholar
Dong, Y., Adam, E., Mike, S., Kaisheng, Y., Zhiheng, H., Brian, G., Oleksii, K., Yu, Z., Frank, S., Huaming, W.: An introduction to computational networks and the computational network toolkit. Technical report, MSR, Microsoft Research (2014)
Google Scholar
Taufiq, H., Rahim, S., John, H.L.H., David, V.L.: Duration mismatch compensation for i-vector based speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2013)
Google Scholar
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119 (2016)
Google Scholar
David, S., Pegah, G., Daniel, P., Daniel, G., Yishay, C., Sanjeev, K.: Deep neural network-based speaker embedding for end-to-end speaker verification. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 165–170 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
Jianzong Wang, Hui Guo & Jing Xiao

Authors

Jianzong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianzong Wang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Zhou
Beihang University, Beijing, China
Yunhong Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Computing and Technology, Chinese Academy of Sciences, Beijing, China
Yong Xu
Shenzhen University, Shenzhen, China
Linlin Shen
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yu Qiao
Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
Zhenhua Guo
Shenzhen University, Shenzhen, China
Shiqi Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Guo, H., Xiao, J. (2017). Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification. In: Zhou, J., et al. Biometric Recognition. CCBR 2017. Lecture Notes in Computer Science(), vol 10568. Springer, Cham. https://doi.org/10.1007/978-3-319-69923-3_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-69923-3_63
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69922-6
Online ISBN: 978-3-319-69923-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics