Abstract
Nowadays, most of the mobile and handheld devices use speech and speaker verification systems (SVS). Even though these systems give satisfactory performance in constrained conditions, there are a number of real-life unconstrained conditions where their performance is not satisfactory. Such SVS cannot accurately authenticate a person when deployed in applications where varying environmental or channel conditions. The actual conditions may be very much different than those used during system training. This creates a large uncertainty in verification scores obtained during evaluation phase of the system. In this regard, we have implemented a verification system using state of the art i-vector-based approach. It is based on total variability subspace (TVS) that benefits in modeling both session and channel variabilities using a single low-dimensional space instead of two different subspaces. Our experiments are conducted using the data taken from speakers in the wild (SITW) database, and the equal error rate (EER) value we have obtained is 23.16%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech and Audio Process 3(1), 72–83 (1995)
Reynolds, D.A., Thomas, F.Q., Robert, B.D.: Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1), 19–41 (2000)
Campbell, W.M., Sturim, D.E., Reynolds, D.A., Solomonoff, A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. (2006)
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)
Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., Castaldo, F.: Support vector machines and joint factor analysis for speaker verification. In: Proceedings of ICASSP, pp.1–4. IEEE (2009)
Dehak, N., Kenny, P., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Haeb-Umbach, R., Ney, H.: Linear discriminant analysis for improved large vocabulary continuous speech recognition. In: Proceedings of ICASSP, vol. 1, pp. 13–14. IEEE (1992)
Hatch, A., Kajarekar, S.S., Stolcke, A.: Within-class covariance normalization for SVM-based speaker recognition. In: Proceedings of 9th International Conference on Spoken Language Processing 2006. Pittsburgh, PA (2006)
Campbell, W.M., Sturim, D.E., Reynolds, D.A., Solomonoff A.: SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: Proceedings of ICASSP, pp. 637–640. Philadelphia, USA (2005)
Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., Dumouchel, P.: Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Proceedings of INTERSPEECH, pp. 1559–1562 (2009)
Prince, S.J.D., Elder J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of 11th International Conference on Computer Vision (2007), pp. 1–8. Rio de Janeiro, Brazil (2007)
Lei, Y., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: Proceedings of ICASSP, pp. 1695–1699. IEEE (2014)
Torfi, A., Dawson, J., Nasrabadi, N.M.: Text-independent speaker verification using 3d convolutional neural networks. In: Proceedings of 2018 International Conference on Multimedia and Expo (ICME 2018), pp. 1–6. West Virginia University, Morgantown (2018)
McLaren, M., Ferrer, L., Castan, D., Lawson, A.: The Speakers in the Wild (SITW) speaker recognition database. In: Proceedings of INTERSPEECH, pp. 812–822 (2016)
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, pp. 1895–1898 (1997)
McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: audio and music signal analysis in python. In: Proceedings of 14th International Conference on Python in Science Conference, pp. 18–25 (2015)
Khoury, E., Shafey, L.E., Marcel, S.: Spear: an open source toolbox for speaker recognition based on Bob. In: Proceedings of the ICASSP, pp. 1655–1659. IEEE (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nirmal, A., Jayaswal, D. (2023). Investigating i-Vector Framework for Speaker Verification in Wild Conditions. In: Choudrie, J., Mahalle, P., Perumal, T., Joshi, A. (eds) IOT with Smart Systems. Smart Innovation, Systems and Technologies, vol 312. Springer, Singapore. https://doi.org/10.1007/978-981-19-3575-6_13
Download citation
DOI: https://doi.org/10.1007/978-981-19-3575-6_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3574-9
Online ISBN: 978-981-19-3575-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)