Soft Computing

, 13:959 | Cite as

A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

  • S. García
  • A. Fernández
  • J. Luengo
  • F. Herrera
Original Paper


The experimental analysis on the performance of a proposed method is a crucial and necessary task to carry out in a research. This paper is focused on the statistical analysis of the results in the field of genetics-based machine Learning. It presents a study involving a set of techniques which can be used for doing a rigorous comparison among algorithms, in terms of obtaining successful classification models. Two accuracy measures for multi-class problems have been employed: classification rate and Cohen’s kappa. Furthermore, two interpretability measures have been employed: size of the rule set and number of antecedents. We have studied whether the samples of results obtained by genetics-based classifiers, using the performance measures cited above, check the necessary conditions for being analysed by means of parametrical tests. The results obtained state that the fulfillment of these conditions are problem-dependent and indefinite, which supports the use of non-parametric statistics in the experimental analysis. In addition, non-parametric tests can be satisfactorily employed for comparing generic classifiers over various data-sets considering any performance measure. According to these facts, we propose the use of the most powerful non-parametric statistical tests to carry out multiple comparisons. However, the statistical analysis conducted on interpretability must be carefully considered.


Genetics-based machine learning Genetic algorithms Statistical tests Non-parametric tests Cohen’s kappa Interpretability Classification 



The study was supported by the Spanish Ministry of Science and Technology under Project TIN-2005-08386-C05-01. J. Luengo holds a FPU scholarship from Spanish Ministry of Education and Science. The authors are very grateful to the anonymous reviewers for their valuable suggestions and comments to improve the quality of this paper. We also are very grateful to Prof. Bacardit, Prof. Bernadó-Mansilla and Prof. Aguilar-Ruiz for providing the KEEL software with the GASSIST-ADI, XCS and HIDER algorithms, respectively.


  1. Aguilar-Ruiz JS, Giráldez R, Riquelme JC (2000) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479CrossRefGoogle Scholar
  2. Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318CrossRefGoogle Scholar
  3. Alpaydin E (2004) Introduction to machine learning, vol 452. MIT Press, CambridgeGoogle Scholar
  4. Anglano C, Botta M (2002) NOW G-Net: learning classification programs on networks of workstations. IEEE Trans Evol Comput 6(13):463–480CrossRefGoogle Scholar
  5. Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA.
  6. Bacardit J (2004) Pittsburgh genetic-based machine learning in the data mining era: representations, generalization and run-time, Dept. Comput. Sci., University Ramon Llull, Barcelona, SpainGoogle Scholar
  7. Bacardit J, Garrell JM (2003) Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 2724. LNCS, Germany, pp 1818–1831Google Scholar
  8. Bacardit J, Garrell JM (2004) Analysis and improvements of the adaptive discretization intervals knowledge representation. In: Proceedings of the genetic and evolutionary computation conference (GECCO’04), vol 3103. LNCS, Germany, pp 726–738Google Scholar
  9. Bacardit J, Garrell JM (2007) Bloat control and generalization pressure using the minimum description length principle for Pittsburgh approach learning classifier system. In: Kovacs T, Llorá X, Takadama K (eds) Advances at the frontier of learning classifier systems, vol 4399. LNCS, USA, pp 61–80Google Scholar
  10. Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851CrossRefGoogle Scholar
  11. Ben-David A (2007) A lot of randomness is hiding in accuracy. Eng Appl Artif Intell 20:875–885CrossRefGoogle Scholar
  12. Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238CrossRefGoogle Scholar
  13. Bernadó-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evol Comput 9(1):82–104CrossRefGoogle Scholar
  14. Clark P, Niblett T (1989) The CN2 induction algorithm. Machine Learn 3(4):261–283Google Scholar
  15. Cohen JA (1960) Coefficient of agreement for nominal scales. Educ Psychol Meas 37–46Google Scholar
  16. Corcoran AL, Sen S (1994) Using real-valued genetic algorithms to evolve rule sets for classification. In: Proceedings of the IEEE conference on evolutionary computation, pp 120–124Google Scholar
  17. De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Machine Learn 13:161–188CrossRefGoogle Scholar
  18. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Machine Learn Res 7:1–30Google Scholar
  19. Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Machine Learn 65(1):95–130CrossRefGoogle Scholar
  20. Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, vol 264. Springer, BerlinGoogle Scholar
  21. Grefenstette JJ (1993) Genetic algorithms for machine learning, vol 176. Kluwer, NorwellGoogle Scholar
  22. Guan SU, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B 35(2):227–239CrossRefGoogle Scholar
  23. Hekanaho J (1998) An evolutionary approach to concept learning. Dissertation, Department of Computer Science, Abo akademi University, Abo, FinlandGoogle Scholar
  24. Hochberg Y (2000) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800–803zbMATHCrossRefMathSciNetGoogle Scholar
  25. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70zbMATHMathSciNetGoogle Scholar
  26. Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRefGoogle Scholar
  27. Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 18:571–595CrossRefGoogle Scholar
  28. Jiao L, Liu J, Zhong W (2006) An organizational coevolutionary algorithm for classification. IEEE Trans Evol Comput 10(1):67–80CrossRefGoogle Scholar
  29. Koch GG (1970) The use of non-parametric methods in the statistical analysis of a complex split plot experiment. Biometrics 26(1):105–128CrossRefGoogle Scholar
  30. Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30(5):810–822CrossRefGoogle Scholar
  31. Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn 40(3):203–228zbMATHCrossRefGoogle Scholar
  32. Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. J Machine Learn Res 6:1127–1168MathSciNetGoogle Scholar
  33. Rivest RL (1987) Learning decision lists. Machine Learn 2:229–246MathSciNetGoogle Scholar
  34. Sheskin DJ (2006) Handbook of parametric and nonparametric statistical procedures, vol 1736. Chapman & Hall/CRC, London/West Palm BeachGoogle Scholar
  35. Shaffer JP (1995) Multiple hypothesis testing. Ann Rev Psychol 46:561–584CrossRefGoogle Scholar
  36. Sigaud O, Wilson SW (2007) Learning classifier systems: a survey. Soft Comput 11:1065–1078zbMATHCrossRefGoogle Scholar
  37. Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australian conference on artificial intelligence, vol 4304. LNCS, Germany, pp 1015–1021Google Scholar
  38. Tan KC, Yu Q, Ang JH (2006) A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 37(12):835–864zbMATHCrossRefMathSciNetGoogle Scholar
  39. Tulai AF, Oppacher F (2004) Multiple species weighted voting - a genetics-based machine learning system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 3103. LNCS, Germany, pp 1263–1274Google Scholar
  40. Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings of the machine learning ECML’93, vol 667. LNAI, Germany, pp 280–296Google Scholar
  41. Wilson SW (1994) ZCS: a zeroth order classifier system. Evol Comput 2:1–18CrossRefGoogle Scholar
  42. Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRefGoogle Scholar
  43. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn, vol 525. Morgan Kaufmann, San FranciscoGoogle Scholar
  44. Wright SP (1992) Adjusted p-values for simultaneous inference. Biometrics 48:1005–1013CrossRefGoogle Scholar
  45. Youden W (1950) Index for rating diagnostic tests. Cancer 3:32–35CrossRefGoogle Scholar
  46. Zar JH (1999) Biostatistical analysis, vol 929. Prentice Hall, Englewood CliffsGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • S. García
    • 1
  • A. Fernández
    • 2
  • J. Luengo
    • 2
  • F. Herrera
    • 2
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain
  2. 2.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations