# A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

## Abstract

The experimental analysis on the performance of a proposed method is a crucial and necessary task to carry out in a research. This paper is focused on the statistical analysis of the results in the field of genetics-based machine Learning. It presents a study involving a set of techniques which can be used for doing a rigorous comparison among algorithms, in terms of obtaining successful classification models. Two accuracy measures for multi-class problems have been employed: classification rate and Cohen’s kappa. Furthermore, two interpretability measures have been employed: size of the rule set and number of antecedents. We have studied whether the samples of results obtained by genetics-based classifiers, using the performance measures cited above, check the necessary conditions for being analysed by means of parametrical tests. The results obtained state that the fulfillment of these conditions are problem-dependent and indefinite, which supports the use of non-parametric statistics in the experimental analysis. In addition, non-parametric tests can be satisfactorily employed for comparing generic classifiers over various data-sets considering any performance measure. According to these facts, we propose the use of the most powerful non-parametric statistical tests to carry out multiple comparisons. However, the statistical analysis conducted on interpretability must be carefully considered.

## Keywords

Genetics-based machine learning Genetic algorithms Statistical tests Non-parametric tests Cohen’s kappa Interpretability Classification## Notes

### Acknowledgments

The study was supported by the Spanish Ministry of Science and Technology under Project TIN-2005-08386-C05-01. J. Luengo holds a FPU scholarship from Spanish Ministry of Education and Science. The authors are very grateful to the anonymous reviewers for their valuable suggestions and comments to improve the quality of this paper. We also are very grateful to Prof. Bacardit, Prof. Bernadó-Mansilla and Prof. Aguilar-Ruiz for providing the KEEL software with the GASSIST-ADI, XCS and HIDER algorithms, respectively.

## References

- Aguilar-Ruiz JS, Giráldez R, Riquelme JC (2000) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479CrossRefGoogle Scholar
- Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318CrossRefGoogle Scholar
- Alpaydin E (2004) Introduction to machine learning, vol 452. MIT Press, CambridgeGoogle Scholar
- Anglano C, Botta M (2002) NOW G-Net: learning classification programs on networks of workstations. IEEE Trans Evol Comput 6(13):463–480CrossRefGoogle Scholar
- Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.htm
- Bacardit J (2004) Pittsburgh genetic-based machine learning in the data mining era: representations, generalization and run-time, Dept. Comput. Sci., University Ramon Llull, Barcelona, SpainGoogle Scholar
- Bacardit J, Garrell JM (2003) Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 2724. LNCS, Germany, pp 1818–1831Google Scholar
- Bacardit J, Garrell JM (2004) Analysis and improvements of the adaptive discretization intervals knowledge representation. In: Proceedings of the genetic and evolutionary computation conference (GECCO’04), vol 3103. LNCS, Germany, pp 726–738Google Scholar
- Bacardit J, Garrell JM (2007) Bloat control and generalization pressure using the minimum description length principle for Pittsburgh approach learning classifier system. In: Kovacs T, Llorá X, Takadama K (eds) Advances at the frontier of learning classifier systems, vol 4399. LNCS, USA, pp 61–80Google Scholar
- Barandela R, Sánchez JS, García V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851CrossRefGoogle Scholar
- Ben-David A (2007) A lot of randomness is hiding in accuracy. Eng Appl Artif Intell 20:875–885CrossRefGoogle Scholar
- Bernadó-Mansilla E, Garrell JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238CrossRefGoogle Scholar
- Bernadó-Mansilla E, Ho TK (2005) Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evol Comput 9(1):82–104CrossRefGoogle Scholar
- Clark P, Niblett T (1989) The CN2 induction algorithm. Machine Learn 3(4):261–283Google Scholar
- Cohen JA (1960) Coefficient of agreement for nominal scales. Educ Psychol Meas 37–46Google Scholar
- Corcoran AL, Sen S (1994) Using real-valued genetic algorithms to evolve rule sets for classification. In: Proceedings of the IEEE conference on evolutionary computation, pp 120–124Google Scholar
- De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Machine Learn 13:161–188CrossRefGoogle Scholar
- Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Machine Learn Res 7:1–30Google Scholar
- Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Machine Learn 65(1):95–130CrossRefGoogle Scholar
- Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms, vol 264. Springer, BerlinGoogle Scholar
- Grefenstette JJ (1993) Genetic algorithms for machine learning, vol 176. Kluwer, NorwellGoogle Scholar
- Guan SU, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B 35(2):227–239CrossRefGoogle Scholar
- Hekanaho J (1998) An evolutionary approach to concept learning. Dissertation, Department of Computer Science, Abo akademi University, Abo, FinlandGoogle Scholar
- Hochberg Y (2000) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800–803zbMATHCrossRefMathSciNetGoogle Scholar
- Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70zbMATHMathSciNetGoogle Scholar
- Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRefGoogle Scholar
- Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat 18:571–595CrossRefGoogle Scholar
- Jiao L, Liu J, Zhong W (2006) An organizational coevolutionary algorithm for classification. IEEE Trans Evol Comput 10(1):67–80CrossRefGoogle Scholar
- Koch GG (1970) The use of non-parametric methods in the statistical analysis of a complex split plot experiment. Biometrics 26(1):105–128CrossRefGoogle Scholar
- Landgrebe TCW, Duin RPW (2008) Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans Pattern Anal Mach Intell 30(5):810–822CrossRefGoogle Scholar
- Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learn 40(3):203–228zbMATHCrossRefGoogle Scholar
- Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. J Machine Learn Res 6:1127–1168MathSciNetGoogle Scholar
- Rivest RL (1987) Learning decision lists. Machine Learn 2:229–246MathSciNetGoogle Scholar
- Sheskin DJ (2006) Handbook of parametric and nonparametric statistical procedures, vol 1736. Chapman & Hall/CRC, London/West Palm BeachGoogle Scholar
- Shaffer JP (1995) Multiple hypothesis testing. Ann Rev Psychol 46:561–584CrossRefGoogle Scholar
- Sigaud O, Wilson SW (2007) Learning classifier systems: a survey. Soft Comput 11:1065–1078zbMATHCrossRefGoogle Scholar
- Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australian conference on artificial intelligence, vol 4304. LNCS, Germany, pp 1015–1021Google Scholar
- Tan KC, Yu Q, Ang JH (2006) A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 37(12):835–864zbMATHCrossRefMathSciNetGoogle Scholar
- Tulai AF, Oppacher F (2004) Multiple species weighted voting - a genetics-based machine learning system. In: Proceedings of the genetic and evolutionary computation conference (GECCO’03), vol 3103. LNCS, Germany, pp 1263–1274Google Scholar
- Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Proceedings of the machine learning ECML’93, vol 667. LNAI, Germany, pp 280–296Google Scholar
- Wilson SW (1994) ZCS: a zeroth order classifier system. Evol Comput 2:1–18CrossRefGoogle Scholar
- Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRefGoogle Scholar
- Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn, vol 525. Morgan Kaufmann, San FranciscoGoogle Scholar
- Wright SP (1992) Adjusted
*p*-values for simultaneous inference. Biometrics 48:1005–1013CrossRefGoogle Scholar - Youden W (1950) Index for rating diagnostic tests. Cancer 3:32–35CrossRefGoogle Scholar
- Zar JH (1999) Biostatistical analysis, vol 929. Prentice Hall, Englewood CliffsGoogle Scholar