Advertisement

Soft Computing

, Volume 20, Issue 11, pp 4313–4330 | Cite as

The influence of noise on the evolutionary fuzzy systems for subgroup discovery

  • J. Luengo
  • A. M. García-Vico
  • M. D. Pérez-Godoy
  • C. J. Carmona
Focus

Abstract

External factors such as the presence of noise in data can affect the data mining process. This is a common problem that produces several negative consequences which involves errors in the data collection, preparation and, above all, in the results obtained by the data mining techniques employed. The capabilities of the models built under such circumstances will depend heavily on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve. A particular supervised learning field like subgroup discovery has overlooked the analysis of noise and its impact on the descriptions obtained. This paper presents an analysis of the impact of noise on the most relevant evolutionary fuzzy systems for subgroup discovery. We also focus on how filtering techniques, devised for predictive tasks, may alleviate the impact of noise on descriptive fields such as subgroup discovery. Specifically, the analysis is carried out using recent filtering techniques for several class noise levels. The results obtained show two different behaviours, on the one hand, the SDIGA and NMEEFSD algorithms present a decrease in the quality of the subgroups when the noise is increased, making necessary the application of noise filtering in order to compensate for this loss of quality. On the other hand, the FuGePSD algorithm demonstrates its great capacity to work in noisy environments without the necessity of using a preliminary filter. The study is completed with an analysis of the interpretability under the influence of noise focused on the number of rules and variables.

Keywords

Subgroup discovery Class noise Noise filters Evolutionary fuzzy systems 

Notes

Acknowledgments

This work was supported by the Spanish Ministry of Economy and Competitiveness under Project TIN2015-68454-R (FEDER Founds), by the Spanish Ministry of Science and Technology under Project TIN2014-57251-P (National Projects) and by the Regional Excellence Projects P11-TIC-7765 and P12-TIC-2958.

Compliance with ethical standards

Conflict of interest

The author declares that there is no conflict of interests regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants and animals performed by any of the authors.

References

  1. Abellán J, Masegosa A (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837CrossRefGoogle Scholar
  2. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput 17(2–3):255–287Google Scholar
  3. Atzmueller M, Puppe F (2006) SD-Map—a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 17th European conference on machine learning and 10th European conference on principles and practice of knowledge discovery in databases. Springer, LNCS, vol 4213, pp 6–17Google Scholar
  4. Bonissone P, Cadenas JM, Carmen Garrido M, Andrés Díaz-Valladares R (2010) A fuzzy random forest. Int J Approx Reason 51(7):729–747MathSciNetCrossRefzbMATHGoogle Scholar
  5. Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167zbMATHGoogle Scholar
  6. Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recognit 45(12):4451–4465CrossRefzbMATHGoogle Scholar
  7. Carmona CJ, González P, del Jesus MJ, Herrera F (2010) NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 18(5):958–970CrossRefGoogle Scholar
  8. Carmona CJ, González P, del Jesus MJ, Navío M, Jiménez L (2011) Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput 15(12):2435–2448CrossRefGoogle Scholar
  9. Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S (2012) Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Syst Appl 39:11243–11249CrossRefGoogle Scholar
  10. Carmona CJ, Chrysostomou C, Seker H, del Jesus MJ (2013a) Fuzzy rules for describing subgroups from Influenza A virus using a multi-objective evolutionary algorithm. Appl Soft Comput 13(8):3439–3448CrossRefGoogle Scholar
  11. Carmona CJ, González P, García-Domingo B, del Jesus MJ, Aguilera J (2013b) MEFES: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology. Knowl Based Syst 54:73–85CrossRefGoogle Scholar
  12. Carmona CJ, González P, del Jesus M, Herrera F (2014) Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Min Knowl Discov 4(2):87–103. doi: 10.1002/widm.1118 CrossRefGoogle Scholar
  13. Carmona CJ, Ruiz-Rodado V, del Jesus MJ, Weber A, Grootveld M, González P, Elizondo D (2015) A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf Sci 298:180–197CrossRefGoogle Scholar
  14. Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley-IEEE Press, New YorkCrossRefzbMATHGoogle Scholar
  15. Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):182–197CrossRefGoogle Scholar
  16. del Jesus MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 15(4):578–592CrossRefGoogle Scholar
  17. Eiben AE, Smith JE (2003) Introduction to evolutionary computation. Springer, BerlinCrossRefzbMATHGoogle Scholar
  18. Fogel DB (1995) Evolutionary computation—toward a new philosophy of machine intelligence. IEEE Press, PiscatawayzbMATHGoogle Scholar
  19. Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefGoogle Scholar
  20. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRefzbMATHGoogle Scholar
  21. Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 17:501–527zbMATHGoogle Scholar
  22. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, BerlinCrossRefGoogle Scholar
  23. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, BostonzbMATHGoogle Scholar
  24. Grosskreutz H, Rueping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19(2):210–216MathSciNetCrossRefGoogle Scholar
  25. Herrera F (2008) Genetic fuzzy systems: taxonomy, current research trends and prospects. Evolut Intell 1:27–46CrossRefGoogle Scholar
  26. Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 29(3):495–525CrossRefGoogle Scholar
  27. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
  28. Kavsek B, Lavrac N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20:543–583CrossRefGoogle Scholar
  29. Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. J Comput Sci Technol 22:387–396CrossRefGoogle Scholar
  30. Kloesgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining, american association for artificial intelligence, pp 249–271Google Scholar
  31. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, CambridgezbMATHGoogle Scholar
  32. Lavrac N, Flach PA, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming. Springer, LNCS, vol 1634, pp 174–185Google Scholar
  33. Lavrac N, Cestnik B, Gamberger D, Flach PA (2004a) Decision support through subgroup discovery: three case studies and the lessons learned. Mach Learn 57(1–2):115–143CrossRefzbMATHGoogle Scholar
  34. Lavrac N, Kavsek B, Flach PA, Todorovski L (2004b) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188MathSciNetGoogle Scholar
  35. Mclachlan GJ (2004) Discriminant analysis and statistical pattern recognition (Wiley series in probability and statistics). Wiley-Interscience, HobokenGoogle Scholar
  36. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
  37. Sáez JA, Galar M, Luengo J, Herrera F (2014) Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl Inf Syst 38(1):179–206CrossRefGoogle Scholar
  38. Schwefel HP (1995) Evolution and optimum seeking. Sixth-generation computer technology series. Wiley, New YorkGoogle Scholar
  39. Sluban B, Gamberger D, Lavra N (2010) Performance analysis of class noise detection algorithms. Front Artif Intell Appl 222:303–314Google Scholar
  40. Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102CrossRefGoogle Scholar
  41. Sáez JA, Galar M, Luengo J, Herrera F (2016) INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf Fusion 27:19–32CrossRefGoogle Scholar
  42. Teng C (2004) Polishing blemishes: issues in data correction. IEEE Intell Syst 19(2):34–39CrossRefGoogle Scholar
  43. Teng CM (1999) Correcting noisy data. In: Proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 239–248Google Scholar
  44. Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings European conference on machine learning. Springer, LNAI vol 667, pp 280–296Google Scholar
  45. Verbaeten S, Assche AV (2003) Ensemble methods for noise elimination in classification problems. In: Fourth international workshop on multiple classifier systems. Springer, pp 317–325Google Scholar
  46. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery. Springer, LNAI, vol 1263, pp 78–87Google Scholar
  47. Wrobel S (2001) Inductive logic programming for knowledge discovery in databases. Springer, chap Relational Data Mining, pp 74–101Google Scholar
  48. Wu X, Zhu X (2008) Mining with noise knowledge: error-aware data mining. IEEE Tran Systems Man Cybern Part A Syst Hum 38(4):917–932CrossRefGoogle Scholar
  49. Zadeh LA (1975) The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Inf Sci 8-9:199–249, 301–357, 43–80Google Scholar
  50. Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22:177–210CrossRefzbMATHGoogle Scholar
  51. Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceeding of the twentieth international conference on machine learning, pp 920–927Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain
  2. 2.Department of Computer ScienceUniversity of JaenJaénSpain
  3. 3.Department of Civil Engineering, Languages and Systems AreaUniversity of BurgosBurgosSpain

Personalised recommendations