Progress in Artificial Intelligence

, Volume 7, Issue 1, pp 55–64 | Cite as

Improvement of subgroup descriptions in noisy data by detecting exceptions

  • Pedro GonzálezEmail author
  • Ángel Miguel García-Vico
  • Cristóbal José Carmona
  • María José del Jesus
Regular Paper


The presence of noise in datasets to which data mining techniques are applied can greatly reduce the quality and interest of the knowledge extracted. Subgroup discovery is a supervised descriptive rule discovery technique which is not exempt from this problem. The aim of this paper is to improve the descriptions of subgroups previously obtained by any subgroup discovery algorithm in noisy datasets. This is achieved using the post-processing approach of the MEFES algorithm, that first detects exceptions in the input subgroups and then includes those exceptions in the descriptions. The experiments performed in noisy datasets show the suitability of the proposal to improve the quality of the results.


Subgroup discovery Exceptions Noisy data MEFES 



This paper was supported by the Spanish Ministry of Economy and Competitiveness under Project TIN2015-68454-R (FEDER Founds).


  1. 1.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  2. 2.
    Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13(3), 307–318 (2009)CrossRefGoogle Scholar
  3. 3.
    Atzmueller, M., Puppe, F., Buscher, H.P.: Towards knowledge-intensive subgroup discovery. In: Proceedings of the Lernen–Wissensentdeckung-Adaptivität-Fachgruppe Maschinelles Lernen, pp. 111–117 (2004)Google Scholar
  4. 4.
    Brodley, C., Friedl, M.: Identifying mislabeled training data. J. Artif. Intell. 11, 131–167 (1999)zbMATHGoogle Scholar
  5. 5.
    Carmona, C.J., Chrysostomou, C., Seker, H., del Jesus, M.J.: Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl. Soft Comput. 13(8), 3439–3448 (2013)CrossRefGoogle Scholar
  6. 6.
    Carmona, C.J., González, P., García-Domingo, B., del Jesus, M.J., Aguilera, J.: MEFES: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to concentrating photovoltaic technology. Knowl. Based Syst. 54, 73–85 (2013)CrossRefGoogle Scholar
  7. 7.
    Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans. Fuzzy Syst. 18(5), 958–970 (2010)CrossRefGoogle Scholar
  8. 8.
    Carmona, C.J., González, P., del Jesus, M.J., Herrera, F.: Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Min. Knowl. Discov. 4(2), 87–103 (2014)CrossRefGoogle Scholar
  9. 9.
    Carmona, C.J., González, P., del Jesus, M.J., Navío, M., Jiménez, L.: Evolutionary fuzzy rule extraction for subgroup discovery in a Psychiatric Emergency Department. Soft. Comput. 15(12), 2435–2448 (2011)CrossRefGoogle Scholar
  10. 10.
    Carmona, C.J., Ramírez-Gallego, S., Torres, F., Bernal, E., del Jesus, M.J., García, S.: Web usage mining to improve the design of an e-commerce website: Expert Syst. Appl. 39, 11243–11249 (2012)CrossRefGoogle Scholar
  11. 11.
    Carmona, C.J., Ruiz-Rodado, V., del Jesus, M.J., Weber, A., Grootveld, M., González, P., Elizondo, D.: A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf. Sci. 298, 180–197 (2015)CrossRefGoogle Scholar
  12. 12.
    Deb, K., Pratap, A., Agrawal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRefGoogle Scholar
  13. 13.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Gamberger, D., Lavrac, N.: Active subgroup mining: a case study in coronary heart disease risk group detection. Artif. Intell. Med. 28(1), 27–57 (2003)CrossRefGoogle Scholar
  15. 15.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft. Comput. 13(10), 959–977 (2009)CrossRefGoogle Scholar
  16. 16.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental Analysis of Power. Inf. Sci. 180, 2044–2064 (2010)CrossRefGoogle Scholar
  17. 17.
    García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)zbMATHGoogle Scholar
  18. 18.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Reading (1989)Google Scholar
  19. 19.
    Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)CrossRefGoogle Scholar
  20. 20.
    del Jesus, M.J., González, P., Herrera, F., Mesonero, M.: Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans. Fuzzy Syst. 15(4), 578–592 (2007)CrossRefGoogle Scholar
  21. 21.
    Jin, N., Flach, P.A., Wilcox, T., Sellman, R., Thumim, J., Knobbe, A.J.: Subgroup discovery in smart electricity meter data. IEEE Trans. Ind. Inf. 10(2), 1327–1336 (2014)CrossRefGoogle Scholar
  22. 22.
    Kavsek, B., Lavrac, N.: APRIORI-SD: adapting association rule learning to subgroup discovery. Appl. Artif. Intell. 20, 543–583 (2006)CrossRefGoogle Scholar
  23. 23.
    Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22(3), 387–396 (2007). doi: 10.1007/s11390-007-9054-2 CrossRefGoogle Scholar
  24. 24.
    Kloesgen, W.: Advances in knowledge discovery and data mining, chap. Explora: A Multipattern and Multistrategy Discovery Assistant, pp. 249–271. American Association for Artificial Intelligence (1996)Google Scholar
  25. 25.
    Kloesgen, W., Zytkow, J. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press Inc, New York (2002)Google Scholar
  26. 26.
    Lavrac, N., Cestnik, B., Gamberger, D., Flach, P.A.: Decision support through subgroup discovery: three case studies and the lessons learned. Mach. Learn. 57(1–2), 115–143 (2004)CrossRefzbMATHGoogle Scholar
  27. 27.
    Luengo, J., García-Vico, A.M., Pérez-Godoy, M.D., Carmona, C.J.: The influence of noise on the evolutionary fuzzy systems for subgroup discovery. Soft. Comput. 20(11), 4313–4330 (2016). doi: 10.1007/s00500-016-2300-1 CrossRefGoogle Scholar
  28. 28.
    Noaman, A.Y., Luna, J.M., Ragab, A.H.M., Ventura, S.: Recommending degree studies according to students? Attitudes in high school by means of subgroup discovery. Int. J. Comput. Intell. Syst. 9(6), 1101–1117 (2016)CrossRefGoogle Scholar
  29. 29.
    Poitras, E.G., Lajoie, S.P., Doleck, T., Jarrel, A.: Subgroup discovery with user interaction data: an empirically guided approach to improving intelligent tutoring systems. Educ. Technol. Soc. 19(2), 204–214 (2016)Google Scholar
  30. 30.
    Sheskin, D.: Handbook of Parametric and Nonparametric Statistical Procedures, 2nd edn. Chapman and Hall, London (2006)Google Scholar
  31. 31.
    Siebes, A.: Data surveying: foundations of an inductive query language. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 269–274. AAAI Press, Palo Alto (1995)Google Scholar
  32. 32.
    Suzuki, E.: Data mining methods for discovering interesting exceptions from an unsupervised table. J. Univers. Comput. Sci. 12(6), 627–653 (2006)Google Scholar
  33. 33.
    Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995). doi: 10.1109/69.404034 CrossRefGoogle Scholar
  34. 34.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery, LNAI, Vol. 1263, pp. 78–87. Springer, Berlin (1997)Google Scholar
  36. 36.
    Wrobel, S.: Relational Data Mining, chap. Inductive Logic Programming for Knowledge Discovery in Databases. Springer, Berlin (2001)Google Scholar
  37. 37.
    Wu, X.: Knowledge Acquisition from Databases. Ablex Publishing Corp, Norwood (1996)Google Scholar
  38. 38.
    Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. Part A 38(4), 917–932 (2008)CrossRefGoogle Scholar
  39. 39.
    Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)CrossRefzbMATHGoogle Scholar
  40. 40.
    Zhu, X., Wu, X., Yang, Y.: Error detection and impactsensitive instance ranking in noisy datasets. In: Proceedings of the 19th National conference on Artificial Intelligence, pp. 378–383. AAAI Press, Palo Alto (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of JaenJaénSpain
  2. 2.Department of Civil EngineeringUniversity of BurgosBurgosSpain
  3. 3.Leicester School of PharmacyDe Montfort UniversityLeicesterUK

Personalised recommendations