Abstract
This paper deals with the problem of training a discriminative classifier when the data sets are imbalanced. More specifically, this work is concerned with the problem of classify a sample as belonging, or not, to a Target Class (TC), when the number of examples from the “Non-Target Class” (NTC) is much higher than those of the TC. The effectiveness of the heuristic method called Non Target Incremental Learning (NTIL) in the task of extracting, from the pool of NTC representatives, the most discriminant training subset with regard to the TC, has been proved when an Artificial Neural Network is used as classifier (ISMIS 2003). In this paper the effectiveness of this method is also shown for Support Vector Machines.
This work has been supported by Ministerio de Ciencia y Tecnología, Spain, under Project TIC2003-08382-C05-03.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Batista, G.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD explorations 6(1), 20–29 (2004)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: Special issue on learning from imbalance data sets. SIGKDD explorations 6(1), 1–6 (2004)
Cortes, C., Vapnik, V.: Support-vector network. Machine Learning (20), 273–297 (1995)
Farrel, K.R., Mammone, R.J., Assaleh, K.T.: Speaker recognition using neural networks and conventional classifiers. IEEE Transations on Speech and Audio Processing, part II, 2(1) (1994)
Japkowicz, N., Stephen, S.: The class imbalance problem: A sistematic study. Intelligent Data Analysis 6(5), 429–449 (2002)
Juszczak, P., Duin, R.P.W.: Uncertainty sampling methods for one-class classifiers. In: Proc. of the Workshop on Learning from Imbalanced Datasets II, ICML (2003)
Mansfield, A.J., Wayman, J.L.: Best pratices in testing and reporting performance of biometric devices. version 2.01. Technical report (2002)
del Brio, B.M., Sanz Molina, A.: Redes Neuronales y Sistemas Borrosos. Ra-Ma (1997)
Osuna, E., Freund, R., Girosi, F.: Support vector machines: Training and applications. Technical report (1997)
Solomonoff, A., Quillen, C., Campbell, W.M.: Channel compensation for svm speaker recognition. In: Proc. Odyssey 2004, the Speaker and Language Recognition Workshop, May 31 - June 3 (2004)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Vivaracho, C.E., Ortega-Garcia, J., Alonso, L., Moro, Q.I.: Extracting the most discriminant subset from a pool of candidates to optimize discriminant classifier training. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 640–645. Springer, Heidelberg (2003)
Vivaracho-Pascual, C., Ortega-Garcia, J., Alonso-Romero, L., Moro-Sancho, Q.: A comparative study of mlp-bsed artificial neural networks in text-indenpendent speaker verification against gmm-based systems. In: Dalsgaard, B.L.P., Benner, H. (eds.) Proc. of Eurospeech 2001, ISCA September 3-7, vol. 3, pp. 1753–1756 (2001)
Weiss, G.M.: Mining with rarity: A unifing framework. SIGKDD explorations 6(1), 7–19 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vivaracho, C.E. (2006). Improving SVM Training by Means of NTIL When the Data Sets Are Imbalanced. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875604_14
Download citation
DOI: https://doi.org/10.1007/11875604_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45764-0
Online ISBN: 978-3-540-45766-4
eBook Packages: Computer ScienceComputer Science (R0)