Skip to main content
Log in

Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The i-vector/probabilistic linear discriminant analysis (PLDA) framework has been popularly used in the field of speaker verification for a long time. Lately, the introduction of online i-vectors and its integration with dynamic time warping template matching technique have significantly improved the performance of text-dependent speaker verification system. The PLDA model learns to discriminate among instances of different speaker-phrase classes and also compensates for channel and session variability. However, when exposed to unseen speakers and text, the variability compensation model turns less than optimal, leading to substantial verification error. In this paper, PLDA adaptation, in order to incorporate the idea of speaker-phrase-dependent variability in the ivector/PLDA technique, has been proposed. The adapted model gets specifically tuned to particular speaker-phrase class, leading to a more optimal solution. Two adaptation techniques, namely interpolation and weighted likelihood, have been explored in this work. Experiments have been performed on Part 1 of the RSR2015 database, and relative equal error rate (EER) reductions of up to 58.22% and 45% have been observed for interpolation and weighted likelihood techniques, respectively. The use of speaker-phrase-specific mean and whitening parameters has led to further improvement, resulting in EER reduction of up to 20% relative to that of the adapted models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from Exploit Technologies Pte Ltd, Singapore under licence agreement and is publicly available.

References

  1. O. Büyük, Sentence-HMM state-based i-vector/PLDA modelling for improved performance in text dependent single utterance speaker verification. IET Signal Process. 10(8), 918–923 (2016)

    Article  Google Scholar 

  2. L. Chen, Y. Zhao, S.X. Zhang, J. Li, G. Ye, F. Soong, Exploring sequential characteristics in speaker bottleneck feature for text-dependent speaker verification, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2018), pp. 5364–5368

  3. R.K. Das, S.M. Prasanna, Investigating text-independent speaker verification systems under varied data conditions. Circuits Syst. Signal Process. 38(8), 3778–3801 (2019)

    Article  Google Scholar 

  4. N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)

    Article  Google Scholar 

  5. S. Dey, T. Koshinaka, P. Motlicek, S. Madikeri, DNN based speaker embedding using content information for text-dependent speaker verification, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2018), pp. 5344–5348

  6. S. Dey, S. Madikeri, M. Ferras, P. Motlicek, Deep neural network based posteriors for text-dependent speaker verification, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2016), pp. 5050–5054

  7. S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)

    Article  Google Scholar 

  8. D. Garcia-Romero, A. McCree, Supervised domain adaptation for i-vector based speaker recognition, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2014), pp. 4047–4051

  9. D. Garcia-Romero, A. McCree, S. Shum, N. Brummer, C. Vaquero, Unsupervised domain adaptation for i-vector speaker recognition, in Proceedings of Odyssey: The Speaker and Language Recognition Workshop (2014)

  10. C. Hanilçi, H. Çeliktaş, Turkish text-dependent speaker verification using i-vector/PLDA approach, in 2018 26th Signal Processing and Communications Applications Conference (SIU) (IEEE, 2018), pp. 1–4

  11. J.H. Hansen, T. Hasan, Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)

    Article  Google Scholar 

  12. G. Heigold, I. Moreno, S. Bengio, N. Shazeer, End-to-end text-dependent speaker verification, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2016), pp. 5115–5119

  13. Q. Hong, L. Li, J. Zhang, L. Wan, H. Guo, Transfer learning for PLDA-based speaker verification. Speech Commun. 92, 90–99 (2017)

    Article  Google Scholar 

  14. Y. Huang, K. Tian, A. Wu, G. Zhang, Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J. Ambient. Intell. Humaniz. Comput. 10(5), 1787–1798 (2019)

    Article  Google Scholar 

  15. S. Jelil, R.K. Das, R. Sinha, S.M. Prasanna, Speaker verification using Gaussian posteriorgrams on fixed phrase short utterances, in Sixteenth Annual Conference of the International Speech Communication Association (2015)

  16. X. Jiang, S. Wang, X. Xiang, Y. Qian, Integrating online i-vector into GMM-UBM for text-dependent speaker verification, in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (IEEE, 2017), pp. 1628–1632

  17. A. Kanagasundaram, R. Vogt, D.B. Dean, S. Sridharan, M.W. Mason, I-vector based speaker recognition on short utterances, in Proceedings of the 12th Annual Conference of the International Speech Communication Association (International Speech Communication Association (ISCA), 2011), pp. 2341–2344

  18. A. Larcher, P.M. Bousquet, K.A. Lee, D. Matrouf, H. Li, J.F. Bonastre, I-vectors in the context of phonetically-constrained short utterances for speaker verification, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2012), pp. 4773–4776

  19. A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)

    Article  Google Scholar 

  20. M.A. Laskar, R.H. Laskar, Complementing the DTW based speaker verification systems with knowledge of specific regions of interest. J. Intell. Fuzzy Syst. 36(3), 2155–2163 (2019)

    Article  Google Scholar 

  21. M.A. Laskar, R.H. Laskar, Integrating DNN–HMM technique with hierarchical multi-layer acoustic model for text-dependent speaker verification. Circuits Syst. Signal Process. 38(8), 3548–3572 (2019)

    Article  Google Scholar 

  22. R.P. Lippmann, Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)

    Article  Google Scholar 

  23. Y. Liu, Y. Qian, N. Chen, T. Fu, Y. Zhang, K. Yu, Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)

    Article  Google Scholar 

  24. S. Madikeri, I. Himawan, P. Motlicek, M. Ferras, Integrating online i-vector extractor with information bottleneck based speaker diarization system, in Sixteenth Annual Conference of the International Speech Communication Association (2015)

  25. M. Mallikarjunan, P.K. Radha, K.P. Bharath, R.K. Muthu, Text-independent speaker recognition in clean and noisy backgrounds using modified VQ-LBG algorithm. Circuits Syst. Signal Process. 38(6), 2810–2828 (2019)

    Article  Google Scholar 

  26. V. Peddinti, G. Chen, D. Povey, S. Khudanpur, Reverberation robust acoustic modeling using i-vectors with time delay neural networks, in Sixteenth Annual Conference of the International Speech Communication Association (2015)

  27. S.J. Prince, J.H. Elder, Probabilistic linear discriminant analysis for inferences about identity, in 2007 IEEE 11th International Conference on Computer Vision (IEEE, 2007), pp. 1–8

  28. R. Sethuraman, J.N. Gowdy, A cepstral based speaker recognition system, in System Theory, Proceedings, Twenty-First Southeastern Symposium (1989), pp. 503–507s

  29. S.H. Shum, D.A. Reynolds, D. Garcia-Romero, A. McCree, Unsupervised clustering approaches for domain adaptation in speaker recognition systems (2014)

  30. G. Soldi, S. Bozonnet, F. Alegre, C. Beaugeant, N. Evans, Short-duration speaker modelling with phone adaptive training, in Odyssey: The Speaker and Language Recognition Workshop (2014), pp. 208–215

  31. T. Stafylakis, P. Kenny, P. Ouellet, J. Perez, M. Kockmann, P. Dumouchel, Text-dependent speaker recognition using PLDA with uncertainty propagation. Matrix 500, 1 (2013)

    Google Scholar 

  32. J. Villalba, E. Lleida, Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data, in Odyssey 2012-The Speaker and Language Recognition Workshop (2012)

  33. S.J. Young, S. Young, The HTK Hidden Markov Model Toolkit: Design and Philosophy (University of Cambridge, Department of Engineering, Cambridge, 1993).

    Google Scholar 

  34. H. Zeinali, H. Sameti, L. Burget, Text-dependent speaker verification based on i-vectors, neural networks and hidden Markov models. Comput. Speech Lang. 46, 53–71 (2017)

    Article  Google Scholar 

  35. J. Zhong, W. Hu, F.K. Soong, H. Meng, DNN i-vector speaker verification with short, text-constrained test utterances, in Interspeech (2017), pp. 1507–1511

Download references

Acknowledgements

The authors would like to thank the Speech and Image Processing Laboratory of National Institute of Technology Silchar for supporting the research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Azharuddin Laskar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laskar, M.A., Bhanja, C.C. & Laskar, R.H. Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification. Circuits Syst Signal Process 40, 5127–5151 (2021). https://doi.org/10.1007/s00034-021-01713-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01713-w

Keywords

Navigation