Skip to main content

Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification

  • Conference paper
  • First Online:
Biometric Recognition (CCBR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10568))

Included in the following conference series:

  • 3594 Accesses

Abstract

Prioritized grid long short-term memory (pGLSTM) has been shown to improve automatic speech recognition efficiently. In this paper, we implement this state-of-the-art model of ASR tasks for text-independent Chinese language speaker verification tasks in which DNN/i-Vector (DNN-based i-Vector) framework is adopted along with PLDA backend. To fully explore the performance, we compared the presented pGLSTM based UBM to GMM-UBM and HLSTM-UBM. Due to constraint of the amount of Chinese transcribed corpus for ASR training, we also explore an adaptation method by firstly training the pGLSTM-UBM on English language with large amount of corpus and use a PLDA adaptation backend to fit into Chinese language before the final speaker verification scoring. Experiments show that both pGLSTM-UBM model with corresponding PLDA backend and pGLSTM-UBM with adapted PLDA backend achieve better performance than the traditional GMM-UBM model. Additionally the pGLSTM-UBM with PLDA backend achieves performance of 4.94% EER in 5 s short utterance and 1.97% EER in 10 s short utterance, achieving 47% and 51% drop comparing to that of GMM. Experiment results imply that DNN from ASR tasks can expand the advantage of UBM model especially in short utterance and that better DNN model for ASR tasks could achieve extra gain in speaker verification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. ASLP 19, 788–798 (2010)

    Google Scholar 

  2. Burget, L., Plchot, O., Cumani, S.: Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4832–4835 (2011)

    Google Scholar 

  3. McLaren, M., Castan, D., Ferrer, L., Lawson, A.: On the issue of calibration in DNN-based speaker recognition systems. In: INTERSPEECH, pp. 1825–1829 (2016)

    Google Scholar 

  4. Lei, Y., Scheffer, N., Ferrer, L.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1695–1699 (2014)

    Google Scholar 

  5. Snyder, D., Garcia-Romero, D., Povey, D.: Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 92–97 (2016)

    Google Scholar 

  6. Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Sig. Process. Lett. 22(10), 1671–1675 (2015)

    Article  Google Scholar 

  7. Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: 8th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 301–305 (2012)

    Google Scholar 

  8. Zhang, Y., Chen, G., Yu, D., Yaco, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNS for distant speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5755–5759 (2016)

    Google Scholar 

  9. Hsu, W., Zhang, Y., Jim, G.: A prioritized grid long short-term memory RNN for speech recognition. In: Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016

    Google Scholar 

  10. Garcia-Romero D., Zhang X., Mccree A.: Improving speaker recognition performance in the domain adaptation challenge using deep neural networks. In: Spoken Language Technology Workshop. IEEE (2015)

    Google Scholar 

  11. Nal, K., Ivo, D., Alex, G.: Grid long short-term memory. arXiv preprint arXiv:1507.01526 (2015)

  12. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)

    Google Scholar 

  13. Dong, Y., Adam, E., Mike, S., Kaisheng, Y., Zhiheng, H., Brian, G., Oleksii, K., Yu, Z., Frank, S., Huaming, W.: An introduction to computational networks and the computational network toolkit. Technical report, MSR, Microsoft Research (2014)

    Google Scholar 

  14. Taufiq, H., Rahim, S., John, H.L.H., David, V.L.: Duration mismatch compensation for i-vector based speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2013)

    Google Scholar 

  15. Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119 (2016)

    Google Scholar 

  16. David, S., Pegah, G., Daniel, P., Daniel, G., Yishay, C., Sanjeev, K.: Deep neural network-based speaker embedding for end-to-end speaker verification. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 165–170 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianzong Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wang, J., Guo, H., Xiao, J. (2017). Prioritized Grid Highway Long Short-Term Memory-Based Universal Background Model for Speaker Verification. In: Zhou, J., et al. Biometric Recognition. CCBR 2017. Lecture Notes in Computer Science(), vol 10568. Springer, Cham. https://doi.org/10.1007/978-3-319-69923-3_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69923-3_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69922-6

  • Online ISBN: 978-3-319-69923-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics