Class-Oriented Reduction of Decision Tree Complexity

  • José-Luis Polo
  • Fernando Berzal
  • Juan-Carlos Cubero
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4994)

Abstract

In some classification problems, apart from a good model, we might be interested in obtaining succinct explanations for particular classes. Our goal is to provide simpler classification models for these classes without a significant accuracy loss. In this paper, we propose some modifications to the splitting criteria and the pruning heuristics used by standard top-down decision tree induction algorithms. This modifications allow us to take each particular class importance into account and lead us to simpler models for the most important classes while, at the same time, the overall classifier accuracy is preserved.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6(1), 20–29 (2004)CrossRefGoogle Scholar
  2. 2.
    Berzal, F., Cubero, J.C., Cuenca, F., Martín-Bautista, M.J.: On the quest for easy-to-understand splitting rules. Data and Knowledge Engineering 44(1), 31–48 (2003)MATHCrossRefGoogle Scholar
  3. 3.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetGoogle Scholar
  4. 4.
    Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)Google Scholar
  5. 5.
    Gamberger, D., Lavrac, N.: Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research 17, 501–527 (2002)MATHGoogle Scholar
  6. 6.
    Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest - a framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery 4(2/3), 127–162 (2000)CrossRefGoogle Scholar
  7. 7.
    Blake, C.L., Newman, D.J., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar
  8. 8.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  9. 9.
    Rastogi, R., Shim, K.: PUBLIC: A decision tree classifier that integrates building and pruning. Data Mining and Knowledge Discovery 4(4), 315–344 (2000)MATHCrossRefGoogle Scholar
  10. 10.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  11. 11.
    Sheskin, D.J.: Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, Boca Raton (2000)MATHGoogle Scholar
  12. 12.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1(6), 80–83 (1945)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • José-Luis Polo
    • 1
  • Fernando Berzal
    • 1
  • Juan-Carlos Cubero
    • 1
  1. 1.Department of Computer Sciences and Artificial IntelligenceUniversity of Granada.GranadaSpain

Personalised recommendations