An Improved Algorithm for SVMs Classification of Imbalanced Data Sets

  • Cristiano Leite Castro
  • Mateus Araujo Carvalho
  • Antônio Padua Braga
Part of the Communications in Computer and Information Science book series (CCIS, volume 43)

Abstract

Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diagnosis and text classification, for instance, small and heavily imbalanced data sets are common. In this paper, we propose the Boundary Elimination and Domination algorithm (BED) to enhance SVM class-prediction accuracy on applications with imbalanced class distributions. BED is an informative resampling strategy in input space. In order to balance the class distributions, our algorithm considers density information in training sets to remove noisy examples of the majority class and generate new synthetic examples of the minority class. In our experiments, we compared BED with original SVM and Synthetic Minority Oversampling Technique (SMOTE), a popular resampling strategy in the literature. Our results demonstrate that this new approach improves SVM classifier performance on several real world imbalanced problems.

Keywords

Support Vector Machines supervised learning imbalanced data sets resampling strategy ROC Curves pattern recognition applications 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boser, B.E., Guyon, I.M., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM Press, New York (1992)Google Scholar
  2. 2.
    Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)CrossRefMATHGoogle Scholar
  3. 3.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20, 273–297 (1995)MATHGoogle Scholar
  4. 4.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, London (2000)CrossRefMATHGoogle Scholar
  5. 5.
    Wu, G., Chang, E.Y.: KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17, 786–795 (2005)CrossRefGoogle Scholar
  6. 6.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42, 203–231 (2001)CrossRefMATHGoogle Scholar
  7. 7.
    Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40, 3358–3378 (2007)CrossRefMATHGoogle Scholar
  8. 8.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html
  9. 9.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATHGoogle Scholar
  10. 10.
    Tan, P., Steinbach, M.: Introduction to Data Mining. Addison Wesley, Reading (2006)Google Scholar
  11. 11.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)Google Scholar
  12. 12.
    Egan, J.P.: Signal detection theory and ROC analysis. Academic Press, London (1975)Google Scholar
  13. 13.
    Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6, 7–19 (2004)CrossRefGoogle Scholar
  14. 14.
    Karakoulas, G., Shawe-Taylor, J.: Optimizing classifiers for imbalanced training sets. In: Proceedings of Conference on Advances in Neural Information Processing Systems II, pp. 253–259. MIT Press, Cambridge (1999)Google Scholar
  15. 15.
    Li, Y., Shawe-Taylor, J.: The SVM with uneven margins and Chinese document categorization. In: Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, pp. 216–227 (2003)Google Scholar
  16. 16.
    Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)Google Scholar
  17. 17.
    Joachims, T.: Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Norwell (2002)CrossRefGoogle Scholar
  18. 18.
    Cristianini, N., Shawe-Taylor, J., Kandola, J.: On kernel target aligment. In: Proceedings of the Neural Information Processing Systems NIPS 2001, pp. 367–373. MIT Press, Cambridge (2002)Google Scholar
  19. 19.
    Kandola, J., Shawe-Taylor, J.: Refining kernels for regression and uneven classification problems. In: Proceedings of International Conference on Artificial Intelligence and Statistics. Springer, Heidelberg (2003)Google Scholar
  20. 20.
    Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Proceedings of European Conference on Machine Learning, pp. 39–50 (2004)Google Scholar
  21. 21.
    Vilariño, F., Spyridonos, P., Vitri, J., Radeva, P.: Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of International Workshop on Pattern Recognition for Crime Prevention, Security and Surveillance, pp. 783–791 (2005)Google Scholar
  22. 22.
    Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst., Man, Cybern. B 39, 281–288 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Cristiano Leite Castro
    • 1
  • Mateus Araujo Carvalho
    • 1
  • Antônio Padua Braga
    • 1
  1. 1.Department of Electronics EngineeringFederal University of Minas GeraisBelo Horizonte, MGBrasil

Personalised recommendations