Advertisement

Machine Learning

, Volume 6, Issue 1, pp 81–92 | Cite as

A Distance-Based Attribute Selection Measure for Decision Tree Induction

  • R. López De Mántaras
Article

Abstract

This note introduces a new attribute selection measure for ID3-like inductive algorithms. This measure is based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node. The relationship of this measure with Quinlan's information gain is also established. It is also formally proved that our distance is not biased towards attributes with large numbers of values. Experimental studies with this distance confirm previously reported results showing that the predictive accuracy of induced decision trees is not sensitive to the goodness of the attribute selection measure. However, this distance produces smaller trees than the gain ratio measure of Quinlan, especially in the case of data whose attributes have significantly different numbers of values.

Distance between partitions decision tree induction information measures 

References

  1. Bratko, I., & Kononenko, I.(1986).Learning diagnostic rules from incomplete and noisy data.Seminar on AI Methods in Statistics.London.Google Scholar
  2. Breiman, I., Friedman, J., Olshen, R., & Stone, C.(1984).Classification and regressing trees.Belmont, CA: Wadsworth International Group.Google Scholar
  3. Cestnik, B., Kononenko, I., & Bratko, I.(1987).ASSISTANT 86:A knowledge-elicitation tool for sophisticated users.In I. Bratko & N. Lavrac (Ed.), Progress in machine learning,Sigma Press.Google Scholar
  4. Clark, P., & Niblett, T.(1987).Induction in noisy domains.In I. Bratko, & N. Lavrac (Eds.), Progress in machine learning,Sigma Press.Google Scholar
  5. Hart, A.(1984).Experience in the use of an inductive system in knowledge engineering.In M. Bramer (Ed.), Research and developments in expert systems.Cambridge University Press.Google Scholar
  6. Kononenko, I., Bratko, I., & Roskar, E.(1984).Experiments in automatic learning of medical diagnostic rules.(Technical Report)Ljubljana, Yugoslavia: Jozef Stefan Institute.Google Scholar
  7. Lopez de Mantaras, R.(1977).Autoapprentissage d 'une partition:Application au classement iteratif de donnees multidimensionelles.Ph.D.thesis.Paul Sabatier University, Toulouse (France).Google Scholar
  8. Mingers, J.(1989).An empirical comparison of selection measures for decision-tree induction.Machine learn-ing, 3, 319-342.Google Scholar
  9. Quinlan, J.R.(1979).Discovering rules by induction from large collections of examples.In D. Michie (Ed.), Expert systems in the microelectronic age.Edinburg University Press.Google Scholar
  10. Quinlan, J.R.(1986).Induction of decision trees.Machine learning, 1, 81-106.Google Scholar

Copyright information

© Kluwer Academic Publishers 1991

Authors and Affiliations

  • R. López De Mántaras
    • 1
  1. 1.Centre of Advanced StudiesCSICGironaSpain

Personalised recommendations