Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Barros, Rodrigo C.; Basgalupp, Márcio P.; de Carvalho, André C. P. L. F.

doi:10.1007/s10710-014-9235-z

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Published: 26 October 2014

Volume 16, pages 241–281, (2015)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Rodrigo C. Barros¹,
Márcio P. Basgalupp² &
André C. P. L. F. de Carvalho³

391 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we analyse in detail the impact of different strategies to be used as fitness function during the evolutionary cycle of a hyper-heuristic evolutionary algorithm that automatically designs decision-tree induction algorithms (HEAD-DT). We divide the experimental scheme into two distinct scenarios: (1) evolving a decision-tree induction algorithm from multiple balanced data sets; and (2) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of well-known classification performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean. Finally, we verify whether the best-performing fitness functions are capable of providing HEAD-DT with algorithms more effective than traditional decision-tree induction algorithms like C4.5, CART, and REPTree. Experimental results indicate that HEAD-DT is a good option for generating algorithms tailored to (im)balanced data, since it outperforms state-of-the-art decision-tree induction algorithms with statistical significance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HEAD-DT: Fitness Function Analysis

Evolutionary Algorithm for Decision Tree Induction

A Hyper-Heuristic Evolutionary Algorithm for Learning Bayesian Network Classifiers

References

R.C. Barros, M.P. Basgalupp, A.C.P.L.F. de Carvalho, A.A. Freitas, A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(3), 291–312 (2012)
Article Google Scholar
R.C. Barros, M.P. Basgalupp, A.C.P.L.F. de Carvalho, A.A. Freitas, A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms, in 14th Genetic and Evolutionary Computation Conference (GECCO 2012) (2012), pp. 1237–1244
R.C. Barros, M.P. Basgalupp, A.C.P.L.F. de Carvalho, A.A. Freitas, Automatic design of decision-tree algorithms with evolutionary algorithms. Evol. Comput. 21(4), 659–684 (2013)
R.C. Barros, M.P. Basgalupp, A.A. Freitas, A.C.P.L.F. de Carvalho, Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets. IEEE Trans. Evol. Comput. in press (2014)
R.C. Barros, A.T. Winck, K.S. Machado, M.P. Basgalupp, A.C.P.L.F. de Carvalho, D.D. Ruiz, O.S. de Souza, Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data. BMC Bioinform. 13(310), 1–14 (2012)
M.P. Basgalupp, R.C. Barros, T.S. da Silva, A.C.P.L.F. de Carvalho, Software effort prediction: a hyper-heuristic decision-tree based approach, in 28th Annual ACM Symposium on Applied Computing (2013), pp. 1109–1116
L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (Wadsworth, Belmont, CA, 1984)
Google Scholar
C. Coello, A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl. Inf. Syst. 1(3), 129–156 (1999)
Google Scholar
P. Cowling, G. Kendall, E. Soubeiga, A hyperheuristic approach to scheduling a sales summit, in Practice and Theory of Automated Timetabling III, Lecture Notes in Computer Science, ed. by E. Burke, W. Erben, vol. 2079 (Springer, Berlin, 2001), pp. 176–190.
A.G.A.C. de Sá, G.L. Pappa, Towards a method for automatically evolving bayesian network classifiers, in Proceeding of the Fifteenth Annual Conference Companion on Genetic and Evolutionary Computation Conference Companion (ACM, New York, NY, USA, 2013), pp. 1505–1512. doi:10.1145/2464576.2482729
B. Delibasic, M. Jovanovic, M. Vukicevic, M. Suknovic, Z. Obradovic, Component-based decision trees for classification. Intell. Data Anal. 15, 1–38 (2011)
Google Scholar
J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar
T. Fawcett, An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
C. Ferri, J. Hernández-Orallo, R. Modroiu, An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)
Article MATH Google Scholar
H. Fisher, G.L. Thompson, Probabilistic learning combinations of local job-shop scheduling rules, in Industrial Scheduling, ed. by J.F. Muth, G.L. Thompson (Prentice Hall, Englewood Cliffs, NJ, 1963), pp. 225–251
Google Scholar
A. Frank, A. Asuncion, UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
A.A. Freitas, A critical review of multi-objective optimization in data mining: a position paper. SIGKDD Explor. Newsl. 6(2), 77–86 (2004)
Article MathSciNet Google Scholar
P. Garrido, M.C. Riff, An evolutionary hyperheuristic to solve strip-packing problems, in Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL’07 (Springer, Berlin, 2007), pp. 406–415.
P. Garrido, M.C. Riff, Dvrp: a hard dynamic combinatorial optimisation problem tackled by an evolutionary hyper-heuristic. J. Heuristics 16(6), 795–834 (2010)
Article MATH Google Scholar
B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, E.R. Dougherty, Small-sample precision of ROC-related estimates. Bioinformatics 26(6), 822–830 (2010)
Article Google Scholar
D.J. Hand, Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009)
Article Google Scholar
N. Japkowicz, S. Stephen, The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)
MATH Google Scholar
J.M. Lobo, A. Jiménez-Valverde, R. Real, AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17(2), 145–151 (2008)
Article Google Scholar
J.G. Marín-Blázquez, S. Schulenburg, A hyper-heuristic framework with XCS: learning to create novel problem-solving algorithms constructed from simpler algorithmic ingredients, in Proceedings of the 2003–2005 International Conference on Learning Classifier Systems, IWLCS’03-05 (Springer, Berlin, 2007), pp. 193–218.
S.J. Mason, N.E. Graham, Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: statistical significance and interpretation. Q. J. R. Meteorol. Soc. 128(584), 2145–2166 (2002)
Article Google Scholar
G. Ochoa, R. Qu, E.K. Burke, Analyzing the landscape of a graph based hyper-heuristic for timetabling problems, in Proceedings of the 11th Annual conference on Genetic and Evolutionary Computation, GECCO ’09 (ACM, New York, NY, USA, 2009), pp. 341–348
M. Oltean, Evolving evolutionary algorithms using linear genetic programming. Evol. Comput. 13(3), 387–410 (2005)
Article Google Scholar
G.L. Pappa, Automatically Evolving Rule Induction Algorithms with Grammar-Based Genetic Programming. Ph.D. thesis, University of Kent at Canterbury (2007)
G.L. Pappa, A.A. Freitas, Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach (Springer, Berlin, Heidelberg, 2009)
G.L. Pappa, A.A. Freitas, Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl. Inf. Syst. 19, 283–309 (2009). doi:10.1007/s10115-008-0171-1
Article Google Scholar
G.L. Pappa, G. Ochoa, M.R. Hyde, A.A. Freitas, J. Woodward, J. Swan, Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms. Genet. Program. Evol. 15(1), 3–35 (2013)
D. Powers, Evaluation: from precision, recall and f-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco, CA, 1993)
K.O. Stanley, R. Miikkulainen, Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)
Article Google Scholar
P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Addison-Wesley, Reading, MA, 2005)
Google Scholar
H. Terashima-Marín, P. Ross, C. Farías-Zárate, E. López-Camacho, M. Valenzuela-Rendón, Generalized hyper-heuristics for solving 2d regular and irregular packing problems. Ann. Oper. Res. 179(1), 369–392 (2010)
Article MATH MathSciNet Google Scholar
J.A. Vázquez-Rodríguez, S. Petrovic, A new dispatching rule based genetic algorithm for the multi-objective job shop problem. J. Heuristics 16(6), 771–793 (2010). doi:10.1007/s10732-009-9120-8
Article MATH Google Scholar
A. Vella, D. Corne, C. Murphy, Hyper-heuristic decision tree induction. in W CONF NAT BIOINSP COMP (2010), pp. 409–414
I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco, CA, 1999)
Google Scholar

Download references

Acknowledgments

This work was funded by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Project 2009/14325-3.

Author information

Authors and Affiliations

Faculdade de Informática (FACIN), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil
Rodrigo C. Barros
Instituto de Ciência e Tecnologia (ICT), Universidade Federal de São Paulo (UNIFESP), São José dos Campos, Brazil
Márcio P. Basgalupp
Instituto de Ciências Matemáticas e de Computação (ICMC), Universidade de São Paulo (USP), São Carlos, Brazil
André C. P. L. F. de Carvalho

Authors

Rodrigo C. Barros
View author publications
You can also search for this author in PubMed Google Scholar
Márcio P. Basgalupp
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. L. F. de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo C. Barros.

Additional information

Area Editor for Data Analytics and Knowledge Discovery: Una-May O'Reilly.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (xlsx 138 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barros, R.C., Basgalupp, M.P. & de Carvalho, A.C.P.L.F. Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification. Genet Program Evolvable Mach 16, 241–281 (2015). https://doi.org/10.1007/s10710-014-9235-z

Download citation

Received: 20 October 2013
Revised: 21 July 2014
Published: 26 October 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10710-014-9235-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Abstract

Access this article

Similar content being viewed by others

HEAD-DT: Fitness Function Analysis

Evolutionary Algorithm for Decision Tree Induction

A Hyper-Heuristic Evolutionary Algorithm for Learning Bayesian Network Classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (xlsx 138 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Investigating fitness functions for a hyper-heuristic evolutionary algorithm in the context of balanced and imbalanced data classification

Abstract

Access this article

Similar content being viewed by others

HEAD-DT: Fitness Function Analysis

Evolutionary Algorithm for Decision Tree Induction

A Hyper-Heuristic Evolutionary Algorithm for Learning Bayesian Network Classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (xlsx 138 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation