Soft Computing

, Volume 21, Issue 10, pp 2609–2618 | Cite as

Searching for the most significant rules: an evolutionary approach for subgroup discovery

  • Victoria Pachón
  • Jacinto Mata
  • Juan Luis Domínguez
Methodologies and Application
  • 127 Downloads

Abstract

In this paper, a new genetic algorithm (GAR-SD\(^{+})\) for subgroup discovery tasks is described. The main feature of this new method is that it can work with both discrete and continuous attributes without previous discretization. The ranges of numeric attributes are obtained in the rules induction process itself. In this way, we ensure that these intervals are the most suitable for maximizing the quality measures. An experimental study was carried out to verify the performance of the method. GAR-SD\(^{+}\) was compared with other subgroup discovery methods by evaluating certain measures (such as number of rules, number of attributes, significance, unusualness, support and confidence). For subgroup discovery tasks, GAR-SD\(^{+}\) obtained good results compared with existing algorithms.

Keywords

Data mining Subgroup discovery Evolutionary algorithms 

References

  1. Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(13):307–318Google Scholar
  2. Atzmüller M, Puppe F (2006) SD-Map a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD-06), pp 6–17Google Scholar
  3. Bay SD, Pazzani MJ (2001) Detecting group differences. Mining contrast sets. Data Min Knowl Discov 5(3):213–246CrossRefMATHGoogle Scholar
  4. Berlanga F, del Jesus MJ, González P, Herrera F, Mesonero M (2006) Multiobjective evolutionary induction of subgroup discovery fuzzy rules: a case study in marketing. In: Perner P (ed) ICDM 2006. LNCS, vol 4065. Springer, pp 337–349 (2006)Google Scholar
  5. Carmona CJ, González P, del Jesús MJ, Herrera F (2010) NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 18(5):958–970CrossRefGoogle Scholar
  6. Chen MY (2014) A high-order fuzzy time series forecasting model for Internet stock trading. Future Gen Comput Syst—Int J Grid Comput eSci 37:461–467CrossRefGoogle Scholar
  7. Chen MY (2013) A hybrid ANFIS model for business failure prediction–utilization of particle swarm optimization and subtractive clustering. Inf Sci 220:180–195CrossRefGoogle Scholar
  8. Chen MY, Fan MH, Chen YL, Wei HM (2013) Design of experiments on neural network’s parameters optimization for time series forecasting in stock markets. Neural Netw World 23(4):369–393CrossRefGoogle Scholar
  9. del Jesús MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery. A case study in marketing. IEEE Trans Fuzzy Syst 15(4):578–592CrossRefGoogle Scholar
  10. Dong G , Li J (1999) Efficient mining of emerging patterns. Discovering trends and differences. In: Proccedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 43–52Google Scholar
  11. Fayyad U, Irani KB (1990) Multi-interval discretization of continuous-valued attributes for classification learning. In: 13th international joint conference on artificial intelligence, pp 1022–1029Google Scholar
  12. Guan Y-Y, Wang H-K, Wang Y, Yang F (2009) Attribute reduction and optimal decision rules acquisition for continuous valued information systems. Inf Sci 179:2974–2984 (8/5)MathSciNetCrossRefMATHGoogle Scholar
  13. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Wesley Longman, ReadingMATHGoogle Scholar
  14. Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19:210–226MathSciNetCrossRefGoogle Scholar
  15. Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Proceedings of the ECML/PKDD. Lecture notes in artificial intelligence, vol 5211. Springer, pp 440–456Google Scholar
  16. Herrera F (2008) Genetic fuzzy systems: taxonomy, current research trends and propects. Evolut Intell 1:27–46CrossRefGoogle Scholar
  17. Kavsek B, Lavrač N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583CrossRefGoogle Scholar
  18. Klösgen W, May M (2002) Spatial subgroup mining integrated in an object-relational spatial database. In Proccedings of the 6th European conference on principles and practice of KDD, pp 275–286Google Scholar
  19. Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining, pp 249–271Google Scholar
  20. Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (ILP-99). LNCS, vol 1634. Springer, pp 174–183Google Scholar
  21. Lavrač N, Kavsek B, Flach P, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188MathSciNetGoogle Scholar
  22. Lemmerich F, Puppe F (2011) Local models for expectation-driven subgroup discovery. In: Proceedings of the IEEE international conference on data mining (ICDM). IEEE, Washington, DC, pp 360–369Google Scholar
  23. Lemmerich F, Rohlfs M, Atzmueller M (2010) Fast discovery of relevant subgroup patterns. In: Proceedings of the 23rd international FLAIRS conference. AAAI Press, pp 428–433Google Scholar
  24. Mata J, Alvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Proccedings of the of PAKDD 2002. Springer, pp 40–51Google Scholar
  25. Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403Google Scholar
  26. Pachón V, Mata J (2012) An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization. Expert Syst Appl 39(1):585–593Google Scholar
  27. Pachón V, Mata J, Domínguez JL, Maña MJ (2011) A multi-objective evolutionary approach for subgroup discovery. In: Corchado E, Kurzynski M, Wozniak M (eds) Proceedings of the 6th international conference on hybrid artificial intelligent systems–volume part II (HAIS’11). Springer, Berlin, Heidelberg, pp 271–279Google Scholar
  28. Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191(15):14–30CrossRefGoogle Scholar
  29. Terlecki P, Walczak K (2007) On the relation between rough set reducts and jumping emerging patterns. Inf Sci 177:74–83 (1/1)MathSciNetCrossRefMATHGoogle Scholar
  30. Tsai C-J, Lee C-I, Yang W-P (2008) A discretization algorithm based on class-attribute contingency coefficient. Inf Sci 178:714–731 (2/1)CrossRefGoogle Scholar
  31. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proccedings of the 1st European conference on principles of data mining and knowledge discovery (PKDD-97), pp 78–87Google Scholar
  32. Zelezny F, Lavrač N (2006) Propositionalization-based relational subgroup discovery with RSD. Mach Learn 62:33–63CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Victoria Pachón
    • 1
  • Jacinto Mata
    • 1
  • Juan Luis Domínguez
    • 1
  1. 1.Escuela Técnica Superior de IngenieríaUniversidad de HuelvaHuelvaSpain

Personalised recommendations