Machine Learning

, Volume 3, Issue 4, pp 319–342 | Cite as

An empirical comparison of selection measures for decision-tree induction

  • John Mingers
Article

Abstract

One approach to induction is to develop a decision tree from a set of examples. When used with noisy rather than deterministic data, the method involve-three main stages—creating a complete tree able to classify all the examples, pruning this tree to give statistical reliability, and processing the pruned tree to improve understandability. This paper is concerned with the first stage — tree creation which relies on a measure for “goodness of split,” that is, how well the attributes discriminate between classes. Some problems encountered at this stage are missing data and multi-valued attributes. The paper considers a number of different measures and experimentally examines their behavior in four domains. The results show that the choice of measure affects the size of a tree but not its accuracy, which remains the same even when attributes are selected randomly.

Keywords

Decision trees knowledge acquisition induction noisy data 

References

  1. Bratko, I., & Kononenko, I. (1986). Learning diagnostic rules from incomplete and noisy data.Seminar on AI Methods in Statistics. London: Unicom Seminars Ltd.Google Scholar
  2. Bratko, I., & Lavrac, N. (Eds.). (1987).Progress in machine learning. Wilmslow, England: Sigma Press.Google Scholar
  3. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984).Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
  4. Bundy, A., Silver, B., & Plummer, D. (1985). An analytical comparison of some rule-learning programs.Artificial Intelligence,27, 137–181.Google Scholar
  5. Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules.International Journal of Man-Machine Studies,27, 349–370.Google Scholar
  6. Cook, D., Craven, A., & Clarke, G. (1985).Statistical compating in Pascal London: Edward Arnold.Google Scholar
  7. Corlett, R. (1983). Explaining induced decision trees.Proceedings of the Third Technical Conference of the BCS Expert Systems Group. London: British Computer Society.Google Scholar
  8. Hart, A. (1984). Experience in the use of an inductive system in knowledge engineering. In M., Bramer (Ed.),Research and developments in expert systems. Cambridge: Cambridge University Press.Google Scholar
  9. Hunt, E., Marin, J., & Stone, P. (1966).Experiments in induction. New York: Academic Press.Google Scholar
  10. Kendall, M., & Stewart, A. (1976).The advanced theory of statistics (Vol. 3), London: Griffin.Google Scholar
  11. Kononenko, I., Bratko, I., & Roskar, E. (1984).Experiments in automatic learning of medical diagnostic rules (Technical report). Ljubljana. Yugoslavia: Jozef Stefan Institute.Google Scholar
  12. Kullback, S. (1967).Information theory and statistics. New York: Dover.Google Scholar
  13. Marshall, R. (1986). Partitioning methods for classification and decision making in medicine.Statistics in Medicine,5, 517–526.Google Scholar
  14. Michalski, R. S. (1978).Designing extended entry decision tables and optimal decision trees using decision diagrams (Technical Report No. 898). Urbana: University of Illinois, Department of Computer Science.Google Scholar
  15. Michalski, R. S., & Chilausky, C. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis.International Journal of Policy Analysis and Information Systems,4, 125–161.Google Scholar
  16. Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (Eds.) (1983).Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.Google Scholar
  17. Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (Eds.) (1986).Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann.Google Scholar
  18. Mingers, J. (1986a). Inducing rules for expert systems — statistical aspects.The Professional Statistician,5, 19–24.Google Scholar
  19. Mingers, J. (1986b). Expert systems — experiments with rule induction.Journal of the Operational Research Society,37, 1031–1037.Google Scholar
  20. Mingers, J. (1987a). Expert systems — rule induction with statistical dataJournal of the Operational Research Society,38, 39–47.Google Scholar
  21. Mingers, J. (1987b). Rule induction with statistical data — a comparison with multiple regression.Journal of the Operational Research Society,38, 347–352.Google Scholar
  22. Mingers, J. (1988).A comparison of methods of pruning induced rule trees (Technical Report). Coventry, England: University of Warwick, School of Industrial and Business Studies.Google Scholar
  23. Quinlan, J. R. (1979). Discovering rules from large collections of examples: A case study. In D., Michie (Ed.),Expert systems in the micro electronic age. Edinburgh: Edinburgh University Press.Google Scholar
  24. Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S., Michalski, J. G., Carbonell, & T. M., Mitchell (Eds.),Machine learning: An artificial intelligence approach. Los Altos: Morgan Kaufmann.Google Scholar
  25. Quinlan, J. R. (1985). Decision trees and multi-valued attributes. In J., Haves & D., Michie (Eds.),Machine intelligence (Vol. 11). Chichester, England: Ellis Horwood.Google Scholar
  26. Quinlan, J. R. (1986a). The effect of noise on concept learning. In R. S., Michalski, J. G., Carbonell, & T. M., Mitchell (Eds.)Machine learning: An artificial intelligence approach (Vol. 2). Los Altos: Morgan Kaufmann.Google Scholar
  27. Quinlan, J. R. (1986b). Induction of decision trees.Machine Learning,1, 81–106.Google Scholar
  28. Quinlan, J. R. (1987). Simplifying decision trees.International Journal of Man-Machine Studies,27, 221–234.Google Scholar
  29. Schlimmer, J. C. & Fisher, D. (1986). A case study of incremental concept induction.Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 496–501). Philadelphia, PA: Morgan Kaufmann.Google Scholar
  30. Shepherd, B. (1983). An appraisal of a decision-tree approach to image classification.Proceedings of the Eighth International Joint Conference on Artificial Intelligence (pp. 473–475). Karlsruhe, West Germany: Morgan Kaufmann.Google Scholar
  31. Sokal, R., & Rohlf, F. (1981).Biometry. San Francisco: Freeman.Google Scholar
  32. Titterington, D., Murray, L., Murray, G., Spiegelhalter, D., Skene, A., Habbema, J., & Gelpke, G. (1981). Comparison of discrimination techniques applied to a complex data set of head injured patients.Journal of the Royal Statistical Society, A Series,144, 145–175.Google Scholar
  33. Upton, G. (1982). A comparison of alternative tests for the 2×2 comparative trial.Journal of the Royal Statistical Society, A Series,145, 86–105.Google Scholar
  34. Utgoff, P. (1988). ID5: An incremental ID3.Proceedings of the Fifth International Conference on Machine Learning (pp. 107–120). Ann Arbor, MI: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1989

Authors and Affiliations

  • John Mingers
    • 1
  1. 1.School of Industrial and Business StudiesUniversity of WarwickCoventryU.K.

Personalised recommendations