Improving Performance of Speaker Identification Systems Using Score Level Fusion of Two Modes of Operation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10458)


In this paper we present a score level fusion methodology for improving the performance of closed-set speaker identification. The fusion is performed on scores which are extracted from GMM-UBM text-dependent and text-independent speaker identification engines. The experimental results indicated that the score level fusion improves the speaker identification performance compared with the best performing single operation mode of speaker identification.


Speaker identification Fusion Machine Learning 



This work was partially supported by the H2020 OCTAVE Project entitled “Objective Control for TAlker VErification” funded by the EC with Grand Agreement number 647850.

The authors would like to thank Dr Md Sahidullah, Dr Nicholas Evans and Dr Tomi Kinnunen for their support in this work.


  1. 1.
    Campbell Jr., J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 1, 430–451 (2004)CrossRefGoogle Scholar
  3. 3.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Proc. 10(1–3), 19–41 (2000), ISSN 1051–2004Google Scholar
  4. 4.
    Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J.: Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Process. Lett. 19(12), 829–832 (2012)CrossRefGoogle Scholar
  5. 5.
    Safavi, S., Najafian, M., Hanani, A., Russell, M., Jancovic, P., Carey, M.: Speaker recognition for children’s speech. In: INTERSPEECH, pp. 1836–1839 (2012)Google Scholar
  6. 6.
    Safavi, S.: Speaker characterization using adult and children’s speech. Ph. D. dissertation, University of Birmingham (2015)Google Scholar
  7. 7.
    Safavi, S., Gan, H., Mporas, I., Sotudeh, R.: Fraud detection in voice-based identity authentication applications and services. In: Proceedings of ICDM (2016)Google Scholar
  8. 8.
    Hébert, M., Sondhi, M., Huang, Y.: Text-Dependent Speaker Recognition. Handbook of Speech Processing, pp. 743–762. Springer, Heidelberg (2008)Google Scholar
  9. 9.
    Larcher, A., Lee, K.A., Ma, B., Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014), ISSN 0167–6393,
  10. 10.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  11. 11.
    Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)CrossRefGoogle Scholar
  12. 12.
    Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRefGoogle Scholar
  13. 13.
    Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)CrossRefGoogle Scholar
  14. 14.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)CrossRefGoogle Scholar
  15. 15.
    Campbell J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In Proceedings of ICASSP 1999, vol. 2, pp. 829–832 (1999)Google Scholar
  16. 16.
    Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)CrossRefGoogle Scholar
  17. 17.
    Schölkopf, B., Burges, CJ.: Advances in Kernel Methods: Support Vector Learning. MIT press (1999)Google Scholar
  18. 18.
    Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)CrossRefGoogle Scholar
  19. 19.
    Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)zbMATHGoogle Scholar
  20. 20.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)Google Scholar
  21. 21.
    Najafian, M., Safavi, S., Weber, P., Russell, M.: Identification of British English regional accent using fusion of i-vector and multi accent phonotactic systems. In: Proceedings of the ODYSSEY, pp. 132–139 (2016)Google Scholar
  22. 22.
    Safavi, S., Russell, M., Jancovic, P.: Identification of age-group from children’s speech by computers and humans. In: INTERSPEECH, pp. 243–247 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Engineering and TechnologyUniversity of Hertfordshire CollegeHertfordshireUK

Personalised recommendations