Soft Computing

, 13:213 | Cite as

Evolutionary rule-based systems for imbalanced data sets

  • Albert Orriols-Puig
  • Ester Bernadó-Mansilla


This paper investigates the capabilities of evolutionary on-line rule-based systems, also called learning classifier systems (LCSs), for extracting knowledge from imbalanced data. While some learners may suffer from class imbalances and instances sparsely distributed around the feature space, we show that LCSs are flexible methods that can be adapted to detect such cases and find suitable models. Results on artificial data sets specifically designed for testing the capabilities of LCSs in imbalanced data show that LCSs are able to extract knowledge from highly imbalanced domains. When LCSs are used with real-world problems, they demonstrate to be one of the most robust methods compared with instance-based learners, decision trees, and support vector machines. Moreover, all the learners benefit from re-sampling techniques. Although there is not a re-sampling technique that performs best in all data sets and for all learners, those based in over-sampling seem to perform better on average. The paper adapts and analyzes LCSs for challenging imbalanced data sets and establishes the bases for further studying the combination of re-sampling technique and learner best suited to a specific kind of problem.


Imbalanced data Rule-based systems Data preprocessing Classification 


  1. Aha DW, Kibler DF, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1): 37–66Google Scholar
  2. Batista G, Prati RC, Monrad MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6(1): 20–29CrossRefGoogle Scholar
  3. Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: Models, analysis and applications to classification tasks. Evol Comput 11(3): 209–238CrossRefGoogle Scholar
  4. Bernadó-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space.. IEEE Trans Evol Comput 9(1): 1–23CrossRefGoogle Scholar
  5. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California.
  6. Butz MV (2006) Rule-based evolutionary online learning systems: a principled approach to LCS analysis and design. In: Studies in fuzziness and soft computing, vol 109. Springer, New YokGoogle Scholar
  7. Butz MV, Wilson SW (2001) An algorithmic description of XCS. In: Lanzi PL, Stolzmann W, Wilson SW (eds) Advances in learning classifier systems: proceedings of the third international workshop. Lecture notes in artificial intelligence, vol 1996. Springer, New York, pp 253–272Google Scholar
  8. Carvalho DR, Freitas AA (2000) A hybrid decision tree/genetic algorithm for coping with the problem of small disjuncts in data mining. In: Proceedings of GECCO’00. Morgan Kaufmann, San Francisco, pp 1061–1068Google Scholar
  9. Chawla NV, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16: 321–357zbMATHGoogle Scholar
  10. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30MathSciNetGoogle Scholar
  11. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comp 10(7): 1895–1924CrossRefGoogle Scholar
  12. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32: 675–701CrossRefGoogle Scholar
  13. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11: 86–92zbMATHCrossRefGoogle Scholar
  14. Goldberg DE (2002) The design of innovation: lessons from and for competent genetic algorithms, 1 edn. Kluwer Academic Publishers, DordrechtzbMATHGoogle Scholar
  15. Holland JH (1976) Adaptation. In: Rosen R, Snell F (eds) Progress in theoretical biology, vol. 4. Academic Press, New York, pp 263–293Google Scholar
  16. Holte RC, Acker LE, Porter BW (1989) Concept learning and the problem of small disjuncts. In: IJCAI’89, pp 813–818Google Scholar
  17. Japkowicz N, Stephen S (2000) The class imbalance problem: significance and strategies. In: IC-AI’00, vol 1, pp 111–117Google Scholar
  18. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–450zbMATHGoogle Scholar
  19. Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. SIGKDD Explor 6(1): 40–49CrossRefMathSciNetGoogle Scholar
  20. Kovacs T (1999) Deletion schemes for classifier systems. In: GECCO’99. Morgan Kaufmann, San Francisco, pp 329–336Google Scholar
  21. Orriols-Puig A (2006) Facetwise analysis of learning classifier systems in imbalanced domains. Technical report, Ramon Llull UniversityGoogle Scholar
  22. Orriols-Puig A, Bernadó-Mansilla E (2006) Bounding XCS parameters for unbalanced datasets. In: GECCO ’06. ACM Press, New York, pp 1561–1568Google Scholar
  23. Orriols-Puig A, Bernadó-Mansilla E (2007) Modeling XCS in class imbalances: population size and parameters’ settings. In: GECCO’07. ACM Press, New York, pp 1838–1845Google Scholar
  24. Orriols-Puig A, Bernadó-Mansilla E (2008) A further look at UCS classifier system. In: Advances at the frontier of LCS. Springer, New York (in press)Google Scholar
  25. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel methods—support Vector Lear. MIT Press, CambridgeGoogle Scholar
  26. Quinlan JR (1995) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San MateoGoogle Scholar
  27. Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybern 6: 769–772zbMATHCrossRefMathSciNetGoogle Scholar
  28. Weiss GM (2003) The effect of small disjuncts and class distribution on decision tree learning. PhD thesis, Graduate School New Brunswick, The State University of New Jersey, New Brunswick, New JerseyGoogle Scholar
  29. Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1): 7–19CrossRefGoogle Scholar
  30. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1: 80–83CrossRefGoogle Scholar
  31. Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2): 149–175CrossRefGoogle Scholar
  32. Wilson SW (1998) Generalization in the XCS classifier system. In: Third annual conference on genetic programming. Morgan Kaufmann, San Francisco, pp 665–674Google Scholar
  33. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscozbMATHGoogle Scholar
  34. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2): 241–259CrossRefGoogle Scholar
  35. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms.. Neural Comput 8(7): 1341–1390CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Grup de Recerca en Sistemes Intelligents, Enginyeria i Arquitectura La SalleUniversitat Ramon LlullBarcelonaSpain

Personalised recommendations