HEAD-DT: Experimental Analysis

Barros, Rodrigo C.; de Carvalho, André C. P. L. F.; Freitas, Alex A.

doi:10.1007/978-3-319-14231-9_5

Rodrigo C. Barros¹⁸,
André C. P. L. F. de Carvalho¹⁹ &
Alex A. Freitas²⁰

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1820 Accesses

Abstract

In this chapter, we present several empirical analyses that assess the performance of HEAD-DT in different scenarios. We divide these analyses into two sets of experiments, according to the meta-training strategy employed for automatically designing the decision-tree algorithms. As mentioned in Chap. 4, HEAD-DT can operate in two distinct frameworks: (i) evolving a decision-tree induction algorithm tailored to one specific data set (specific framework); or (ii) evolving a decision-tree induction algorithm from multiple data sets (general framework). The specific framework provides data from a single data set to HEAD-DT for both algorithm design (evolution) and performance assessment. The experiments conducted for this scenario (see Sect. 5.1) make use of public data sets that do not share a common application domain. In the general framework, distinct data sets are used for algorithm design and performance assessment. In this scenario (see Sect. 5.2), we conduct two types of experiments, namely the homogeneous approach and the heterogeneous approach. In the homogeneous approach, we analyse whether automatically designing a decision-tree algorithm for a particular domain provides good results. More specifically, the data sets that feed HEAD-DT during evolution, and also those employed for performance assessment, share a common application domain. In the heterogeneous approach, we investigate whether HEAD-DT is capable of generating an algorithm that performs well across a variety of different data sets, regardless of their particular characteristics or application domain. We also discuss about the theoretic and empirical time complexity of HEAD-DT in Sect. 5.3, and we make a brief discussion on the cost-effectiveness of automated algorithm design in Sect. 5.4. We present examples of algorithms which were automatically designed by HEAD-DT in Sect. 5.5. We conclude the experimental analysis by empirically verifying in Sect. 5.6 whether the genetic search is worthwhile.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://archive.ics.uci.edu/ml/.
2.
The values of \(l\) and \(c\) for each data set can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/CompCancer/datasets.htm.
3.
http://archive.ics.uci.edu/ml/.
4.
The term overfitting is not used because it refers to a model that overfits the data, whereas we are talking about the case of an algorithm that “overfits” the data, in the sense that it is excellent when dealing with those data sets it was designed to, but it underperforms in previously unseen data sets.

References

R.C. Barros et al., Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data, in BMC Bioinformatics 13 (2012)
Google Scholar
R.C. Barros et al., Towards the automatic design of decision tree induction algorithms, in 13th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 2011). pp. 567–574 (2011)
Google Scholar
M.P. Basgalupp et al., Software effort prediction: a hyper-heuristic decision-tree based approach, in 28th Annual ACM Symposium on Applied Computing. pp. 1109–1116 (2013)
Google Scholar
L. Breiman et al., Classification and Regression Trees (Wadsworth, Belmont, 1984)
MATH Google Scholar
B. Chandra, R. Kothari, P. Paul, A new node splitting measure for decision tree construction. Pattern Recognit. 43(8), 2725–2731 (2010)
Article MATH Google Scholar
B. Chandra, P.P. Varghese, Moving towards efficient decision tree construction. Inf. Sci. 179(8), 1059–1069 (2009)
Article MATH Google Scholar
J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). ISSN: 1532–4435
MATH MathSciNet Google Scholar
A. Frank, A. Asuncion, UCI Machine Learning Repository (2010)
Google Scholar
R. Iman, J. Davenport, Approximations of the critical region of the Friedman statistic, in Communications in Statistics, pp. 571–595 (1980)
Google Scholar
S. Monti et al., Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118 (2003)
Article MATH Google Scholar
J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco, 1993). ISBN: 1-55860-238-0
Google Scholar
M. Souto et al., Clustering cancer gene expression data: a comparative study. BMC Bioinform. 9(1), 497 (2008)
Article Google Scholar
F. Wilcoxon, Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)
Article Google Scholar
I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco, 1999). ISBN: 1558605525
Google Scholar

Download references

Author information

Authors and Affiliations

Faculdade de Informática, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, RS, Brazil
Rodrigo C. Barros
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, SP, Brazil
André C. P. L. F. de Carvalho
School of Computing, University of Kent, Canterbury, Kent, UK
Alex A. Freitas

Authors

Rodrigo C. Barros
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. L. F. de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo C. Barros .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Barros, R.C., de Carvalho, A.C.P.L.F., Freitas, A.A. (2015). HEAD-DT: Experimental Analysis. In: Automatic Design of Decision-Tree Induction Algorithms. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-14231-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-14231-9_5
Published: 05 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14230-2
Online ISBN: 978-3-319-14231-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics