Speeding Up Logistic Model Tree Induction

  • Marc Sumner
  • Eibe Frank
  • Mark Hall
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3721)

Abstract

Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the AIC criterion [1] instead of cross-validation to prevent overfitting these models. In addition, a weight trimming heuristic is used which produces a significant speedup. We compare the training time and accuracy of the new induction process with the original one on various datasets and show that the training time often decreases while the classification accuracy diminishes only slightly.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second Int. Symposium on Information Theory, pp. 267–281 (1973)Google Scholar
  2. 2.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  3. 3.
    Breiman, L., Friedman, H., Olshen, J.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)Google Scholar
  4. 4.
    Bühlmann, P., Yu, B.: Boosting, model selection, lasso and nonnegative garrote. Technical Report 2005-127, Seminar for Statistics, ETH Zürich (2005)Google Scholar
  5. 5.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. In. Conf. on Machine Learning, pp. 148–156. Morgan Kaufmann, San Francisco (1996)Google Scholar
  6. 6.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. The Annals of Statistics 38(2), 337–374 (2000)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)MATHGoogle Scholar
  8. 8.
    Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Machine Learning 59(1/2), 161–205 (2005)MATHCrossRefGoogle Scholar
  9. 9.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural Information Processing Systems, vol. 12, pp. 307–313. MIT Press, Cambridge (1999)Google Scholar
  10. 10.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  11. 11.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implemenations. Morgan Kaufmann, San Francisco (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Marc Sumner
    • 1
    • 2
  • Eibe Frank
    • 2
  • Mark Hall
    • 2
  1. 1.Institute for Computer ScienceUniversity of FreiburgFreiburgGermany
  2. 2.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand

Personalised recommendations