Improving Robustness of Speaker Verification by Fusion of Prompted Text-Dependent and Text-Independent Operation Modalities

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)


In this paper we present a fusion methodology for combining prompted text-dependent and text-independent speaker verification operation modalities. The fusion is performed in score level extracted from GMM-UBM single mode speaker verification engines using several machine learning algorithms for classification. In order to improve the performance we apply clustering of the score-based data before the classification stage. The experimental results indicated that the fusion of the two operation modes improves the speaker verification performance both in terms of sensitivity and specificity by approximately 2 % and 1.5 % respectively.


Speaker verification Fusion Machine learning 



This work was partially supported by the H2020 OCTAVE Project entitled “Objective Control for TAlker VErification” funded by the EC with Grand Agreement number 647850. The authors would like to thank Dr Md Sahidullah, Dr Nicholas Evans and Dr Tomi Kinnunen for their support in this work.


  1. 1.
    Aronowitz, H., Hoory, R., Pelecanos, J., Nahamoo, D.: New developments in voice biometrics for user authentication. In: Proceedings of the Interspeech (2011)Google Scholar
  2. 2.
    Hébert, M., Sondhi, M., Huang, Y.: Text-Dependent Speaker Recognition. Book Section. In: Springer Handbook of Speech Processing, pp. 743–762 (2008)Google Scholar
  3. 3.
    Larcher, A., Kong, A.L., Bin, M., Haizhou, L.: Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)CrossRefGoogle Scholar
  4. 4.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)CrossRefGoogle Scholar
  5. 5.
    Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J.: Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Proc. Lett. 19(12), 829–832 (2012)CrossRefGoogle Scholar
  6. 6.
    Safavi, S., Najafian, M., Hanani, A., Russell, M.J., Jancovic, P., Carey, M.J.: Speaker Recognition for Children’s Speech. In: Interspeech, pp. 1836–1839 (2012)Google Scholar
  7. 7.
    Ganchev, T., Siafarikas, M., Mporas, I., Stoyanova, T.: Wavelet basis selection for enhanced speech parameterization in speaker verification. Int. J. Speech Technol. 17(1), 27–36 (2014)CrossRefGoogle Scholar
  8. 8.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Proc. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  9. 9.
    Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Proc. 29(2), 254–272 (1981)CrossRefGoogle Scholar
  10. 10.
    Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Proc. 3(1), 72–83 (1995)CrossRefGoogle Scholar
  11. 11.
    Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Neural Information Processing Systems 16, Neural Information Processing Systems, NIPS 2003, 8–13 December 2003, Vancouver and Whistler, British Columbia, Canada (2003)Google Scholar
  12. 12.
    Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Proc. Lett. 13(5), 308–311 (2006)CrossRefGoogle Scholar
  13. 13.
    Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Proc. 15(4), 1435–1447 (2007)CrossRefGoogle Scholar
  14. 14.
    Campbell, J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In: Proceedings of ICASSP 1999, vol. 2, pp. 829–832 (1999)Google Scholar
  15. 15.
    Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Proc. 2(4), 578–589 (1994)CrossRefGoogle Scholar
  16. 16.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining, Practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Engineering and TechnologyUniversity of HertfordshireHatfieldUK

Personalised recommendations