Abstract
The class imbalance problem naturally occurs in some classification problems where the amount of training samples available for one class may be much less than that of another. In order to deal with this problem, random sampling based methods are generally used. This paper proposes a clustering based sampling technique to select a subset from the majority class involving much larger amount of training data. The proposed approach is verified in designing a post-classifier using AdaBoost to improve the speaker verification decisions. Experiments conducted on NIST99 speaker verification corpus have shown that in general, the proposed sampling technique provides better equal error rates (EER) than random sampling.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bengio, S., Mariethoz, J.: Learning the decision function for speaker verification. In: IEEE-ICASSP Proceedings (2001)
Monard, M.C., Batista, G.E.A.P.A.: Learning with Skewed Class Distribution. In: Abe, J.M., da Silva Filho, J.I. (eds.) Advances in Logic, Artificial Intelligence and Robotics, pp. 173–180. IOS Press, Amsterdam (2002)
Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR-44, Department of Computer Science, Rutgers University (August 2001)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Second European Conference on Computational Learning Theory (March 1995)
Duin, R.P.W.: PRTOOLS (version 3.0). A Matlab toolbox for pattern recognition. Pattern Recognition Group, Delft University, Netherlands (January 2000)
Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proc. 17th International Conf. on Machine Learning, pp. 983–990. Morgan Kaufmann, San Francisco (2000)
Merler, S., Furlanello, C., Larcher, B., Sboner, A.: Automatic model selection in cost-sensitive boosting. Information Fusion 4(1), 3–10 (2003)
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Reynolds, D.A., Quateri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)
Kuncheva, L.I., Whitaker, C.J.: Using diversity with three variants of boosting: Aggressive, conservative, and inverse. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 81. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Altınçay, H., Ergün, C. (2004). Clustering Based Under-Sampling for Improving Speaker Verification Decisions Using AdaBoost. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2004. Lecture Notes in Computer Science, vol 3138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27868-9_76
Download citation
DOI: https://doi.org/10.1007/978-3-540-27868-9_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22570-6
Online ISBN: 978-3-540-27868-9
eBook Packages: Springer Book Archive