Advertisement

A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems

  • Salvador García
  • José Ramón Cano
  • Alberto Fernández
  • Francisco Herrera
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)

Abstract

Unbalanced data in a classification problem appears when there are many more instances of some classes than others. Several solutions were proposed to solve this problem at data level by under-sampling. The aim of this work is to propose evolutionary prototype selection algorithms that tackle the problem of unbalanced data by using a new fitness function. The results obtained show that a balancing of data performed by evolutionary under-sampling outperforms previously proposed under-sampling methods in classification accuracy, obtaining reduced subsets and getting a good balance on data.

Keywords

Geometric Mean Class Distribution Minority Class Balance Accuracy Class Imbalance Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6, 1–6 (2004)CrossRefGoogle Scholar
  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence and Research 16, 321–357 (2002)MATHGoogle Scholar
  3. Tan, S.: Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28, 667–671 (2005)CrossRefGoogle Scholar
  4. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 20–29 (2004)CrossRefGoogle Scholar
  5. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)MATHCrossRefGoogle Scholar
  6. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003)MATHGoogle Scholar
  7. Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Transactions on Evolutionary Computation 7, 561–575 (2003)CrossRefGoogle Scholar
  8. Eshelman, L.J.: The CHC adaptative search algorithm: How to safe search when engaging in nontraditional genetic recombination. In: FOGA, pp. 265–283 (1990)Google Scholar
  9. Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report, Pittsburgh, PA, USA (1994)Google Scholar
  10. Tomek, I.: Two modifications of CNN. IEEE Transactions on Systems, Man, and Communications 6, 769–772 (1976)MATHCrossRefMathSciNetGoogle Scholar
  11. Hart, P.E.: The condensed nearest neighbour rule. IEEE Transactions on Information Theory 18, 515–516 (1968)CrossRefGoogle Scholar
  12. Kubat, M., Matwin, S.: Addressing the course of imbalanced training sets: Onesided selection. In: ICML, pp. 179–186 (1997)Google Scholar
  13. Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2, 408–421 (1972)MATHCrossRefGoogle Scholar
  15. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 7, 37–66 (1991)Google Scholar
  16. Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)CrossRefGoogle Scholar
  17. Newman, D.J., Hettich, S., Merz, C.B.: UCI repository of machine learning databases (1998) Google Scholar
  18. Demśar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)Google Scholar
  19. Wilcoxon, F.: Individual comparisons by rankings methods. Biometrics 1, 80–83 (1945)CrossRefGoogle Scholar
  20. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (1997)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Salvador García
    • 1
  • José Ramón Cano
    • 2
  • Alberto Fernández
    • 1
  • Francisco Herrera
    • 1
  1. 1.Department of Computer Science and Artificial Intelligence, E.T.S.I. InformáticaUniversity of GranadaGranadaSpain
  2. 2.Department of Computer ScienceUniversity of JaénLinares, JaénSpain

Personalised recommendations