Client Dependent GMM-SVM Models for Speaker Verification

  • Quan Le
  • Samy Bengio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2714)


Generative Gaussian Mixture Models (GMMs) are known to be the dominant approach for modeling speech sequences in text independent speaker verification applications because of their scalability, good performance and their ability in handling variable size sequences. On the other hand, because of their discriminative properties, models like Support Vector Machines (SVMs) usually yield better performance in static classification problems and can construct flexible decision boundaries. In this paper, we try to combine these two complementary models by using Support Vector Machines to postprocess scores obtained by the GMMs. A cross-validation method is also used in the baseline system to increase the number of client scores in the training phase, which enhances the results of the SVM models. Experiments carried out on the XM2VTS and PolyVar databases confirm the interest of this hybrid approach.


Support Vector Machine Support Equal Error Rate World Model Discriminative Model Baseline System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    S. Bengio and J. Mariéthoz. Learning the decision function for speaker verification. In IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing ICASSP, 2001.Google Scholar
  2. 2.
    C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data mining and Knowledge Discovery, 2(2):1–47, 1998.CrossRefGoogle Scholar
  3. 3.
    G. Chollet, J.-L. Cochard, A. Constantinescu, C. Jaboulet, and P. Langlais. Swiss french polyphone and polyvar: telephone speech databases to model inter-and intra-speaker variability. IDIAP-RR 1, IDIAP, 1996.Google Scholar
  4. 4.
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. Jrnl. of Royal Statistical Society B, 39:1–38, 1977.zbMATHMathSciNetGoogle Scholar
  5. 5.
    S. Furui. Recent advances in speaker recognition. Lecture Notes in Computer Science, 1206:237–252, 1997.Google Scholar
  6. 6.
    J. Mariéthoz and S. Bengio. A comparative study of adaptation methods for speaker verification. In Intl. Conf. on Spoken Language Processing ICSLP, 2002.Google Scholar
  7. 7.
    K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre. XM2VTSDB: The extended M2VTS database. In Second International Conference on Audio and Video-based Biometric Person Authentication AVBPA, March 1999.Google Scholar
  8. 8.
    D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10:19–41, 2000.CrossRefGoogle Scholar
  9. 9.
    V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY, USA, 1995.zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Quan Le
    • 1
  • Samy Bengio
    • 1
  1. 1.IDIAPMartignySwitzerland

Personalised recommendations