The influence of noise on the evolutionary fuzzy systems for subgroup discovery
External factors such as the presence of noise in data can affect the data mining process. This is a common problem that produces several negative consequences which involves errors in the data collection, preparation and, above all, in the results obtained by the data mining techniques employed. The capabilities of the models built under such circumstances will depend heavily on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve. A particular supervised learning field like subgroup discovery has overlooked the analysis of noise and its impact on the descriptions obtained. This paper presents an analysis of the impact of noise on the most relevant evolutionary fuzzy systems for subgroup discovery. We also focus on how filtering techniques, devised for predictive tasks, may alleviate the impact of noise on descriptive fields such as subgroup discovery. Specifically, the analysis is carried out using recent filtering techniques for several class noise levels. The results obtained show two different behaviours, on the one hand, the SDIGA and NMEEFSD algorithms present a decrease in the quality of the subgroups when the noise is increased, making necessary the application of noise filtering in order to compensate for this loss of quality. On the other hand, the FuGePSD algorithm demonstrates its great capacity to work in noisy environments without the necessity of using a preliminary filter. The study is completed with an analysis of the interpretability under the influence of noise focused on the number of rules and variables.
KeywordsSubgroup discovery Class noise Noise filters Evolutionary fuzzy systems
This work was supported by the Spanish Ministry of Economy and Competitiveness under Project TIN2015-68454-R (FEDER Founds), by the Spanish Ministry of Science and Technology under Project TIN2014-57251-P (National Projects) and by the Regional Excellence Projects P11-TIC-7765 and P12-TIC-2958.
Compliance with ethical standards
Conflict of interest
The author declares that there is no conflict of interests regarding the publication of this paper.
This article does not contain any studies with human participants and animals performed by any of the authors.
- Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput 17(2–3):255–287Google Scholar
- Atzmueller M, Puppe F (2006) SD-Map—a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 17th European conference on machine learning and 10th European conference on principles and practice of knowledge discovery in databases. Springer, LNCS, vol 4213, pp 6–17Google Scholar
- Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
- Kloesgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining, american association for artificial intelligence, pp 249–271Google Scholar
- Lavrac N, Flach PA, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming. Springer, LNCS, vol 1634, pp 174–185Google Scholar
- Mclachlan GJ (2004) Discriminant analysis and statistical pattern recognition (Wiley series in probability and statistics). Wiley-Interscience, HobokenGoogle Scholar
- Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
- Schwefel HP (1995) Evolution and optimum seeking. Sixth-generation computer technology series. Wiley, New YorkGoogle Scholar
- Sluban B, Gamberger D, Lavra N (2010) Performance analysis of class noise detection algorithms. Front Artif Intell Appl 222:303–314Google Scholar
- Teng CM (1999) Correcting noisy data. In: Proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 239–248Google Scholar
- Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings European conference on machine learning. Springer, LNAI vol 667, pp 280–296Google Scholar
- Verbaeten S, Assche AV (2003) Ensemble methods for noise elimination in classification problems. In: Fourth international workshop on multiple classifier systems. Springer, pp 317–325Google Scholar
- Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery. Springer, LNAI, vol 1263, pp 78–87Google Scholar
- Wrobel S (2001) Inductive logic programming for knowledge discovery in databases. Springer, chap Relational Data Mining, pp 74–101Google Scholar
- Zadeh LA (1975) The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Inf Sci 8-9:199–249, 301–357, 43–80Google Scholar
- Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceeding of the twentieth international conference on machine learning, pp 920–927Google Scholar