Advertisement

Benchmarking Open-Source Tree Learners in R/RWeka

  • Michael Schauerhuber
  • Achim Zeileis
  • David Meyer
  • Kurt Hornik
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The two most popular classification tree algorithms in machine learning and statistics — C4.5 and CART — are compared in a benchmark experiment together with two other more recent constant-fit tree learners from the statistics literature (QUEST, conditional inference trees). The study assesses both misclassification error and model complexity on bootstrap replications of 18 different benchmark datasets. It is carried out in the R system for statistical computing, made possible by means of the RWeka package which interfaces R to the opensource machine learning toolbox Weka. Both algorithms are found to be competitive in terms of misclassification error—with the performance difference clearly varying across data sets. However, C4.5 tends to grow larger and thus more complex trees.

Keywords

Predictive Performance Dominance Relation Recursive Partitioning Tree Learner Breast Cancer Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. (1984): Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.zbMATHGoogle Scholar
  2. HORNIK, K. and MEYER, D. (2007): Deriving Consensus Rankings from Benchmarking Experiments In: Advances in Data Analysis (Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., March 8-10, 2006, Berlin), Decker, R., Lenz, H.-J. (Eds.), Springer-Verlag, 163-170.Google Scholar
  3. HORNIK, K., ZEILEIS, A., HOTHORN, T. and BUCHTA, C. (2007): RWeka: An R Interface to Weka. R package version 0.3-2. http://CRAN.R-project.org/.
  4. HOTHORN, T., HORNIK, K. and ZEILEIS, A. (2006): Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15 (3),651-674.CrossRefMathSciNetGoogle Scholar
  5. HOTHORN, T., LEISCH, F., ZEILEIS, A. and HORNIK, K. (2005): The Design and Analysis of Benchmark Experiments. Journal of Computational and Graphical Statistics, 14(3), 675-699.CrossRefMathSciNetGoogle Scholar
  6. LOH, W. and SHIH, Y. (1997): Split Selection Methods for Classification Trees. Statistica Sinica, 7, 815-840.zbMATHMathSciNetGoogle Scholar
  7. NEWMAN, D., HETTICH, S., BLAKE, C. and MERZ C. (1998): UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/∼mlearn/MLRepository.html.
  8. QUINLAN, J. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo, CA.Google Scholar
  9. R DEVELOPMENT CORE TEAM (2006): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org/.
  10. THERNEAU, T. and ATKINSON, E. (1997): An Introduction to Recursive Partitioning Using the rpart Routine. Technical Report. Section of Biostatistics, Mayo Clinic, Rochester, http://www.mayo.edu/hsr/techrpt/61.pdf.
  11. WITTEN, I., and FRANK, E. (2005): Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition.zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Michael Schauerhuber
    • 1
  • Achim Zeileis
    • 1
  • David Meyer
    • 2
  • Kurt Hornik
    • 1
  1. 1.Department of Statistics and MathematicsWirtschaftsuniversität WienWienAustria
  2. 2.Institute for Management Information SystemsWirtschaftsuniversität WienWienAustria

Personalised recommendations