A First Approach in the Class Noise Filtering Approaches for Fuzzy Subgroup Discovery

  • C. J. Carmona
  • J. Luengo
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 368)


The presence of noise in data is a common problem that produces several negative consequences, and is an unavoidable problem, which affects the data collection and data preparation processes in Data Mining applications, where errors commonly occur. The performance of the models built under such circumstances will heavily depend on the quality of the training data. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve without using specialized techniques. A particular supervised learning field as subgroup discovery has overlooked the analysis of noise and its impact in the description obtained. In this paper, the noise impact in subgroup discovery is analyzed in a complete experimental study, using recent filtering techniques for several class noise levels. Specifically, the analysis is performed through the FuGePSD algorithm which is a state-of-the-art SD algorithm based on genetic programming and fuzzy logic.


Subgroup discovery Class noise Noise filters 



Supported by the the Spanish Ministry of Economy and Competitiveness under projects TIN2012-33856 (FEDER Founds), the Spanish Ministry of Science and Technology under Projects TIN2011-28488 and TIN2010-15055, and also by the Regional Projects P10-TIC-6858 and P12-TIC-2958.


  1. 1.
    Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17(2–3):255–287Google Scholar
  2. 2.
    Bonissone P, Cadenas JM, Carmen M (2010) Garrido, and R. Andrés Díaz-Valladares. A fuzzy random forest. International Journal of Approximate Reasoning 51(7):729–747CrossRefMathSciNetGoogle Scholar
  3. 3.
    Brodley CE, Friedl MA (1999) Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11:131–167zbMATHGoogle Scholar
  4. 4.
    Carmona CJ, González P, del Jesus M, Herrera F (2014) Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Mining and Knowledge Discovery 4(2):87–103CrossRefGoogle Scholar
  5. 5.
    Carmona CJ, Ruiz-Rodado V, del Jesus M, Weber A, Grootveld M, González P, Elizondo D (2015) A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Information Sciences 298:180–197CrossRefGoogle Scholar
  6. 6.
    del Jesus MJ, González P, Herrera F, Mesonero M (2007) Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A case study in marketing. IEEE Transactions on Fuzzy Systems 15(4):578–592CrossRefGoogle Scholar
  7. 7.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7:1–30zbMATHGoogle Scholar
  8. 8.
    A. E. Eiben and J. E. Smith. Introduction to evolutionary computation. Springer, 2003Google Scholar
  9. 9.
    Gamberger D, Lavrac N (2002) Expert-Guided Subgroup Discovery: Methodology and Application. Journal Artificial Intelligence Research 17:501–527zbMATHGoogle Scholar
  10. 10.
    García S, Luengo J, Herrera F (2015) Data Preprocessing in Data Mining. Springer Publishing Company, IncorporatedCrossRefGoogle Scholar
  11. 11.
    Herrera F (2008) Genetic fuzzy systems: taxomony, current research trends and prospects. Evolutionary Intelligence 1:27–46CrossRefGoogle Scholar
  12. 12.
    Herrera F, Carmona CJ, González P, del Jesus MJ (2011) An overview on Subgroup Discovery: Foundations and Applications. Knowledge and Information Systems 29(3):495–525CrossRefGoogle Scholar
  13. 13.
    Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. Journal of Computer Science and Technology 22:387–396CrossRefGoogle Scholar
  14. 14.
    W. Kloesgen. Explora: A Multipattern and Multistrategy Discovery Assistant. In Advances in Knowledge Discovery and Data Mining, pages 249–271. American Association for Artificial Intelligence, 1996Google Scholar
  15. 15.
    J. R. Koza. Genetic Programming: On the Programming of computers by Means of Natural Selection. MIT Press, 1992Google Scholar
  16. 16.
    Lavrac N, Cestnik B, Gamberger D, Flach PA (2004) Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned. Machine Learning 57(1–2):115–143CrossRefzbMATHGoogle Scholar
  17. 17.
    G. J. Mclachlan. Discriminant Analysis and Statistical Pattern Recognition (Wiley Series in Probability and Statistics). Wiley-Interscience, 2004Google Scholar
  18. 18.
    J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CA, USA, 1993Google Scholar
  19. 19.
    C.-M. Teng. Correcting Noisy Data. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 239–248, San Francisco, CA, USA, 1999. Morgan Kaufmann PublishersGoogle Scholar
  20. 20.
    S. Verbaeten and A. V. Assche. Ensemble methods for noise elimination in classification problems. In Fourth International Workshop on Multiple Classifier Systems, pages 317–325. Springer, 2003Google Scholar
  21. 21.
    S. Wrobel. An Algorithm for Multi-relational Discovery of Subgroups. In Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, volume 1263 of LNAI, pages 78–87. Springer, 1997Google Scholar
  22. 22.
    L. A. Zadeh. The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Information Science, 8–9:199–249,301–357,43–80, 1975Google Scholar
  23. 23.
    Zhu X, Wu X (2004) Class Noise vs. Attribute Noise: A Quantitative Study. Artificial Intelligence Review 22:177–210CrossRefzbMATHGoogle Scholar
  24. 24.
    X. Zhu, X. Wu, and Q. Chen. Eliminating class noise in large datasets. In Proceeding of the Twentieth International Conference on Machine Learning, pages 920–927, 2003Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Dept. of Civil EngineeringUniversity of BurgosBurgosSpain

Personalised recommendations