Generalized Conditional Entropy and a Metric Splitting Criterion for Decision Trees

  • Dan A. Simovici
  • Szymon Jaroszewicz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods and have comparable or better accuracy.

Keywords

decision tree generalized conditional entropy metric metric betweenness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lerman, I.C.: Classification et analyse ordinale des données, Dunod, Paris (1981)Google Scholar
  2. 2.
    Daróczy, Z.: Generalized information functions. Information and Control 16, 36–51 (1970)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Simovici, D.A., Jaroszewicz, S.: An axiomatization of partition entropy. IEEE Transactions on Information Theory 48, 2138–2142 (2002)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    de Mántaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)CrossRefGoogle Scholar
  5. 5.
    Simovici, D.A., Jaroszewicz, S.: Generalized entropy and decision trees. In: EGC 2003 - Journees francophones d’Extraction et de Gestion de Connaissances, Lyon, France, pp. 369–380 (2003)Google Scholar
  6. 6.
    Birkhoff, G.: Lattice Theory. American Mathematical Society, Providence (1973)Google Scholar
  7. 7.
    Barthélemy, J., Leclerc, B.: The median procedure for partitions. In: Partitioning Data Sets, Providence, American Mathematical Society, pp. 3–34 (1995)Google Scholar
  8. 8.
    Barthélemy, J.: Remarques sur les propriétés metriques des ensembles ordonnés. Math. Sci. hum. 61, 39–60 (1978)MATHGoogle Scholar
  9. 9.
    Monjardet, B.: Metrics on partially ordered sets – a survey. Discrete Mathematics 35, 173–184 (1981)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison-Wesley, Boston (2005)Google Scholar
  11. 11.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1998)Google Scholar
  12. 12.
    Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  13. 13.
    Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  14. 14.
    Simovici, D.A., Singla, N., Kuperberg, M.: Metric incremental clustering of nominal data. In: Proceedings of ICDM 2004, Brighton, UK, pp. 523–527 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dan A. Simovici
    • 1
  • Szymon Jaroszewicz
    • 2
  1. 1.Dept. of Computer ScienceUniversity of Massachusetts at BostonBoston
  2. 2.Faculty of Computer and Information SystemsTechnical University of SzeczinPoland

Personalised recommendations