The IIR Submission to CSLP 2006 Speaker Recognition Evaluation
This paper describes the design and implementation of a practical automatic speaker recognition system for the CSLP speaker recognition evaluation (SRE). The speaker recognition system is built upon four subsystems using speaker information from acoustic spectral features. In addition to the conventional spectral features, a novel temporal discrete cosine transform (TDCT) feature is introduced in order to capture long-term speech dynamic. The speaker information is modeled using two complementary speaker modeling techniques, namely, Gaussian mixture model (GMM) and support vector machine (SVM). The resulting subsystems are then integrated at the score level through a multilayer perceptron (MLP) neural network. Evaluation results confirm that the feature selection, classifier design, and fusion strategy are successful, giving rise to an effective speaker recognition system.
KeywordsDiscrete Cosine Transform Speaker Recognition Speaker Verification Voice Activity Detector Test Segment
Unable to display preview. Download preview PDF.
- 1.Furui, S.: Speaker verification. In: Madisetti, V.K., Williams, D.B. (eds.) Digital Signal Processing Handbook. CRC Press LLC, Boca Raton (1999)Google Scholar
- 2.Quatieri, T.F.: Discrete-time speech signal processing: principles and practice. Prentice-Hall, Upper- Sadder River (2002)Google Scholar
- 3.Evaluation Plan for ISCSLP 2006 Special Session on Speaker Recognition, Chinese Corpus Consortium (April 2006)Google Scholar
- 5.Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing ASSP-28(4) (August 1980)Google Scholar
- 6.Kinnunen, T.H., Koh, C.W.E., Wang, L., Li, H., Chng, E.S.: Shifted delta cepstrum amd temporal discrete cosine transform features in speaker verification. Accepted for presentation in International Symposium on Chinese Spoken Language Processing (2006)Google Scholar
- 7.Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Margin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska, D., Reynolds, D.A.: A tutorial on textindepent speaker verification. Eurasip Journal on Applied Signal Processing 4, 430–451 (2004)Google Scholar
- 9.Campbell, W.M.: Generalized linear discriminant sequence kernels for speaker recognition. In: Proc. ICASSP, pp. 161–164 (2002)Google Scholar