Clustering Based Under-Sampling for Improving Speaker Verification Decisions Using AdaBoost

  • Hakan Altınçay
  • Cem Ergün
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3138)


The class imbalance problem naturally occurs in some classification problems where the amount of training samples available for one class may be much less than that of another. In order to deal with this problem, random sampling based methods are generally used. This paper proposes a clustering based sampling technique to select a subset from the majority class involving much larger amount of training data. The proposed approach is verified in designing a post-classifier using AdaBoost to improve the speaker verification decisions. Experiments conducted on NIST99 speaker verification corpus have shown that in general, the proposed sampling technique provides better equal error rates (EER) than random sampling.


Training Sample Gaussian Mixture Model Majority Class Minority Class Equal Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bengio, S., Mariethoz, J.: Learning the decision function for speaker verification. In: IEEE-ICASSP Proceedings (2001)Google Scholar
  2. 2.
    Monard, M.C., Batista, G.E.A.P.A.: Learning with Skewed Class Distribution. In: Abe, J.M., da Silva Filho, J.I. (eds.) Advances in Logic, Artificial Intelligence and Robotics, pp. 173–180. IOS Press, Amsterdam (2002)Google Scholar
  3. 3.
    Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR-44, Department of Computer Science, Rutgers University (August 2001)Google Scholar
  4. 4.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Second European Conference on Computational Learning Theory (March 1995)Google Scholar
  5. 5.
    Duin, R.P.W.: PRTOOLS (version 3.0). A Matlab toolbox for pattern recognition. Pattern Recognition Group, Delft University, Netherlands (January 2000)Google Scholar
  6. 6.
    Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proc. 17th International Conf. on Machine Learning, pp. 983–990. Morgan Kaufmann, San Francisco (2000)Google Scholar
  7. 7.
    Merler, S., Furlanello, C., Larcher, B., Sboner, A.: Automatic model selection in cost-sensitive boosting. Information Fusion 4(1), 3–10 (2003)CrossRefGoogle Scholar
  8. 8.
    Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)CrossRefGoogle Scholar
  9. 9.
    Reynolds, D.A., Quateri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)CrossRefGoogle Scholar
  10. 10.
    Kuncheva, L.I., Whitaker, C.J.: Using diversity with three variants of boosting: Aggressive, conservative, and inverse. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 81. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Hakan Altınçay
    • 1
  • Cem Ergün
    • 1
  1. 1.Advanced Technology Research and Development InstituteEastern Mediterranean UniversityGazi Mağusa KKTC, Mersin 10Turkey

Personalised recommendations