Abstract
In this paper we present a score level fusion methodology for improving the performance of closed-set speaker identification. The fusion is performed on scores which are extracted from GMM-UBM text-dependent and text-independent speaker identification engines. The experimental results indicated that the score level fusion improves the speaker identification performance compared with the best performing single operation mode of speaker identification.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campbell Jr., J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 1, 430–451 (2004)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Proc. 10(1–3), 19–41 (2000), ISSN 1051–2004
Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J.: Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Process. Lett. 19(12), 829–832 (2012)
Safavi, S., Najafian, M., Hanani, A., Russell, M., Jancovic, P., Carey, M.: Speaker recognition for children’s speech. In: INTERSPEECH, pp. 1836–1839 (2012)
Safavi, S.: Speaker characterization using adult and children’s speech. Ph. D. dissertation, University of Birmingham (2015)
Safavi, S., Gan, H., Mporas, I., Sotudeh, R.: Fraud detection in voice-based identity authentication applications and services. In: Proceedings of ICDM (2016)
Hébert, M., Sondhi, M., Huang, Y.: Text-Dependent Speaker Recognition. Handbook of Speech Processing, pp. 743–762. Springer, Heidelberg (2008)
Larcher, A., Lee, K.A., Ma, B., Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014), ISSN 0167–6393, http://dx.doi.org/10.1016/j.specom.2014.03.001
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Campbell J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In Proceedings of ICASSP 1999, vol. 2, pp. 829–832 (1999)
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)
Schölkopf, B., Burges, CJ.: Advances in Kernel Methods: Support Vector Learning. MIT press (1999)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Najafian, M., Safavi, S., Weber, P., Russell, M.: Identification of British English regional accent using fusion of i-vector and multi accent phonotactic systems. In: Proceedings of the ODYSSEY, pp. 132–139 (2016)
Safavi, S., Russell, M., Jancovic, P.: Identification of age-group from children’s speech by computers and humans. In: INTERSPEECH, pp. 243–247 (2014)
Acknowledgement
This work was partially supported by the H2020 OCTAVE Project entitled “Objective Control for TAlker VErification” funded by the EC with Grand Agreement number 647850.
The authors would like to thank Dr Md Sahidullah, Dr Nicholas Evans and Dr Tomi Kinnunen for their support in this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Safavi, S., Mporas, I. (2017). Improving Performance of Speaker Identification Systems Using Score Level Fusion of Two Modes of Operation. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)