Performance Comparison of Recognition System Using I-Vector Based on Different Conditioning Methods
This paper focuses on the analysis of the i-vector paradigm, a compact representation of speaker utterances that is used by most of the state of the art speaker verification systems. The i-vector subspace modeling is one of the recent methods that has become the state of the art technique in this domain. This method largely provides the benefit of modelling both the intra-domain and inter-domain variabilities into the same low dimensional space. In this study, 2656 syllables bio-acoustic signals from 55 species of frog taken from 5 sources of database for frog identification system. Parameters of the system are initially tuned such as GMM component size (16, 32, 64 and 128 Gaussians) and experimented for 3 different conditioning methods which are whitening, Linear Discriminant Analysis (LDA) and Within Class Covariance Normalization (WCCN). This work was mainly motivated by the need to quantify the impact of their steps on the final performance, especially their ability to model data according to a theoretical Gaussian framework. To the end, we assess the effect of the parameter tuned and observe the recognition rate. We observed that, the accuracy for smaller GMM component and i-vector with WCCN outperform others with result 86.67%. These investigations allow highlighting the key points of the approach, in particular a core conditioning procedure that leads to the success of the i-vector paradigm.
KeywordsI-vector extraction Speaker recognition Conditioning Whitening WCCN LDA
This work was sponsored and supported by Research University Grant (1001.PELECT.8014057).
- 1.Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19, 788–798 (2011a)Google Scholar
- 2.Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Speaker and session variability in GMM-based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)Google Scholar
- 3.Dehak, N., Torres-Carrasquillo, P., Reynolds D., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Proceedings of Interspeech, pp. 857–860 (2011)Google Scholar
- 4.Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)Google Scholar
- 5.Glembek, O., Burget, L., Matějka, P., Karafiát, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)Google Scholar
- 6.Garcia-Romero D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH, pp. 249–252 (2011)Google Scholar
- 8.Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., Dumouchel, P.: Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: INTERSPEECH, pp. 1559–1562 (2009)Google Scholar
- 9.Dehak, N.: Discriminative and generative approaches for long- and short-term speaker characteristics modeling: application to speaker verification. Ph.D. thesis, Ecole de Technologie Superieure, Montreal (2009)Google Scholar
- 10.Hatch, A., Kajarekar, S., Stolcke, A.: Within-class covariance normalization for SVM-based speaker recognition. In: International Conference on Spoken Language Processing, September, Pittsburgh, PA, USA (2006)Google Scholar