Advertisement

Genetic Programming and Evolvable Machines

, Volume 16, Issue 3, pp 241–281 | Cite as

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

  • Rodrigo C. BarrosEmail author
  • Márcio P. Basgalupp
  • André C. P. L. F. de Carvalho
Article

Abstract

In this paper, we analyse in detail the impact of different strategies to be used as fitness function during the evolutionary cycle of a hyper-heuristic evolutionary algorithm that automatically designs decision-tree induction algorithms (HEAD-DT). We divide the experimental scheme into two distinct scenarios: (1) evolving a decision-tree induction algorithm from multiple balanced data sets; and (2) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of well-known classification performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean. Finally, we verify whether the best-performing fitness functions are capable of providing HEAD-DT with algorithms more effective than traditional decision-tree induction algorithms like C4.5, CART, and REPTree. Experimental results indicate that HEAD-DT is a good option for generating algorithms tailored to (im)balanced data, since it outperforms state-of-the-art decision-tree induction algorithms with statistical significance.

Keywords

Hyper-heuristics Decision trees Fitness function  Imbalanced data 

Notes

Acknowledgments

This work was funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Project 2009/14325-3.

Supplementary material

10710_2014_9235_MOESM1_ESM.xlsx (138 kb)
Supplementary material 1 (xlsx 138 KB)

References

  1. 1.
    R.C. Barros, M.P. Basgalupp, A.C.P.L.F. de Carvalho, A.A. Freitas, A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(3), 291–312 (2012)CrossRefGoogle Scholar
  2. 2.
    R.C. Barros, M.P. Basgalupp, A.C.P.L.F. de Carvalho, A.A. Freitas, A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms, in 14th Genetic and Evolutionary Computation Conference (GECCO 2012) (2012), pp. 1237–1244Google Scholar
  3. 3.
    R.C. Barros, M.P. Basgalupp, A.C.P.L.F. de Carvalho, A.A. Freitas, Automatic design of decision-tree algorithms with evolutionary algorithms. Evol. Comput. 21(4), 659–684 (2013)Google Scholar
  4. 4.
    R.C. Barros, M.P. Basgalupp, A.A. Freitas, A.C.P.L.F. de Carvalho, Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets. IEEE Trans. Evol. Comput. in press (2014)Google Scholar
  5. 5.
    R.C. Barros, A.T. Winck, K.S. Machado, M.P. Basgalupp, A.C.P.L.F. de Carvalho, D.D. Ruiz, O.S. de Souza, Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data. BMC Bioinform. 13(310), 1–14 (2012)Google Scholar
  6. 6.
    M.P. Basgalupp, R.C. Barros, T.S. da Silva, A.C.P.L.F. de Carvalho, Software effort prediction: a hyper-heuristic decision-tree based approach, in 28th Annual ACM Symposium on Applied Computing (2013), pp. 1109–1116Google Scholar
  7. 7.
    L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (Wadsworth, Belmont, CA, 1984)Google Scholar
  8. 8.
    C. Coello, A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl. Inf. Syst. 1(3), 129–156 (1999)Google Scholar
  9. 9.
    P. Cowling, G. Kendall, E. Soubeiga, A hyperheuristic approach to scheduling a sales summit, in Practice and Theory of Automated Timetabling III, Lecture Notes in Computer Science, ed. by E. Burke, W. Erben, vol. 2079 (Springer, Berlin, 2001), pp. 176–190.Google Scholar
  10. 10.
    A.G.A.C. de Sá, G.L. Pappa, Towards a method for automatically evolving bayesian network classifiers, in Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion (ACM, New York, NY, USA, 2013), pp. 1505–1512. doi: 10.1145/2464576.2482729
  11. 11.
    B. Delibasic, M. Jovanovic, M. Vukicevic, M. Suknovic, Z. Obradovic, Component-based decision trees for classification. Intell. Data Anal. 15, 1–38 (2011)Google Scholar
  12. 12.
    J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetGoogle Scholar
  13. 13.
    T. Fawcett, An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar
  14. 14.
    C. Ferri, J. Hernández-Orallo, R. Modroiu, An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)CrossRefzbMATHGoogle Scholar
  15. 15.
    H. Fisher, G.L. Thompson, Probabilistic learning combinations of local job-shop scheduling rules, in Industrial Scheduling, ed. by J.F. Muth, G.L. Thompson (Prentice Hall, Englewood Cliffs, NJ, 1963), pp. 225–251Google Scholar
  16. 16.
    A. Frank, A. Asuncion, UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
  17. 17.
    A.A. Freitas, A critical review of multi-objective optimization in data mining: a position paper. SIGKDD Explor. Newsl. 6(2), 77–86 (2004)CrossRefMathSciNetGoogle Scholar
  18. 18.
    P. Garrido, M.C. Riff, An evolutionary hyperheuristic to solve strip-packing problems, in Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL’07 (Springer, Berlin, 2007), pp. 406–415.Google Scholar
  19. 19.
    P. Garrido, M.C. Riff, Dvrp: a hard dynamic combinatorial optimisation problem tackled by an evolutionary hyper-heuristic. J. Heuristics 16(6), 795–834 (2010)CrossRefzbMATHGoogle Scholar
  20. 20.
    B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, E.R. Dougherty, Small-sample precision of ROC-related estimates. Bioinformatics 26(6), 822–830 (2010)CrossRefGoogle Scholar
  21. 21.
    D.J. Hand, Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009)CrossRefGoogle Scholar
  22. 22.
    N. Japkowicz, S. Stephen, The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)zbMATHGoogle Scholar
  23. 23.
    J.M. Lobo, A. Jiménez-Valverde, R. Real, AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)CrossRefGoogle Scholar
  24. 24.
    J.G. Marín-Blázquez, S. Schulenburg, A hyper-heuristic framework with XCS: learning to create novel problem-solving algorithms constructed from simpler algorithmic ingredients, in Proceedings of the 2003–2005 International Conference on Learning Classifier Systems, IWLCS’03-05 (Springer, Berlin, 2007), pp. 193–218.Google Scholar
  25. 25.
    S.J. Mason, N.E. Graham, Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: statistical significance and interpretation. Q. J. R. Meteorol. Soc. 128(584), 2145–2166 (2002)CrossRefGoogle Scholar
  26. 26.
    G. Ochoa, R. Qu, E.K. Burke, Analyzing the landscape of a graph based hyper-heuristic for timetabling problems, in Proceedings of the 11th Annual conference on Genetic and Evolutionary Computation, GECCO ’09 (ACM, New York, NY, USA, 2009), pp. 341–348Google Scholar
  27. 27.
    M. Oltean, Evolving evolutionary algorithms using linear genetic programming. Evol. Comput. 13(3), 387–410 (2005)CrossRefGoogle Scholar
  28. 28.
    G.L. Pappa, Automatically Evolving Rule Induction Algorithms with Grammar-Based Genetic Programming. Ph.D. thesis, University of Kent at Canterbury (2007)Google Scholar
  29. 29.
    G.L. Pappa, A.A. Freitas, Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach (Springer, Berlin, Heidelberg, 2009)Google Scholar
  30. 30.
    G.L. Pappa, A.A. Freitas, Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl. Inf. Syst. 19, 283–309 (2009). doi: 10.1007/s10115-008-0171-1 CrossRefGoogle Scholar
  31. 31.
    G.L. Pappa, G. Ochoa, M.R. Hyde, A.A. Freitas, J. Woodward, J. Swan, Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms. Genet. Program. Evol. 15(1), 3–35 (2013)Google Scholar
  32. 32.
    D. Powers, Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar
  33. 33.
    J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco, CA, 1993)Google Scholar
  34. 34.
    K.O. Stanley, R. Miikkulainen, Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)CrossRefGoogle Scholar
  35. 35.
    P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Addison-Wesley, Reading, MA, 2005)Google Scholar
  36. 36.
    H. Terashima-Marín, P. Ross, C. Farías-Zárate, E. López-Camacho, M. Valenzuela-Rendón, Generalized hyper-heuristics for solving 2d regular and irregular packing problems. Ann. Oper. Res. 179(1), 369–392 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    J.A. Vázquez-Rodríguez, S. Petrovic, A new dispatching rule based genetic algorithm for the multi-objective job shop problem. J. Heuristics 16(6), 771–793 (2010). doi: 10.1007/s10732-009-9120-8 CrossRefzbMATHGoogle Scholar
  38. 38.
    A. Vella, D. Corne, C. Murphy, Hyper-heuristic decision tree induction. in W CONF NAT BIOINSP COMP (2010), pp. 409–414Google Scholar
  39. 39.
    I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco, CA, 1999)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Rodrigo C. Barros
    • 1
    Email author
  • Márcio P. Basgalupp
    • 2
  • André C. P. L. F. de Carvalho
    • 3
  1. 1.Faculdade de Informática (FACIN)Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)Porto AlegreBrazil
  2. 2.Instituto de Ciência e Tecnologia (ICT)Universidade Federal de São Paulo (UNIFESP)São José dos CamposBrazil
  3. 3.Instituto de Ciências Matemáticas e de Computação (ICMC)Universidade de São Paulo (USP)São CarlosBrazil

Personalised recommendations