An Exact Probability Metric for Decision Tree Splitting and Stopping

Martin, J. Kent

doi:10.1023/A:1007367629006

An Exact Probability Metric for Decision Tree Splitting and Stopping

Published: August 1997

Volume 28, pages 257–291, (1997)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

An Exact Probability Metric for Decision Tree Splitting and Stopping

Download PDF

J. Kent Martin¹

2917 Accesses
49 Citations
Explore all metrics

Abstract

ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated for by C4.5's gain ratio. Several alternatives have been proposed and are examined here (distance, orthogonality, a Beta function, and two chi-squared tests). All of these metrics are biased towards splits with smaller branches, where low-entropy splits are likely to occur by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the exact posterior probability of the null hypothesis that the class distribution is independent of the split. Both gain and the chi-squared tests arise in asymptotic approximations to the hypergeometric, with similar criteria for their admissibility. Previous failures of pre-pruning are traced in large part to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. The choice of split-selection metric typically has little effect on accuracy, but can profoundly affect complexity and the effectiveness and efficiency of pruning. Empirical results show that hypergeometric pre-pruning should be done in most cases, as trees pruned in this way are simpler and more efficient, and typically no less accurate than unpruned or post-pruned trees.

References

Agresti, A. (1990). Categorical data analysis. New York: John Wiley & Sons.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Pacific Grove, CA: Wadsworth & Brooks.
Google Scholar
Buntine, W. L. (1990). A theory of learning classification rules. PhD thesis. University of Technology, Sydney.
Buntine, W. & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85.
Google Scholar
Cestnik, B., Kononenko, I. & Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation tool for sophisticated users. In Progress in Machine Learning, EWSL-87. Wilmslow: Sigma Press.
Google Scholar
Cestnik, B. & Bratko, I. (1991). On estimating probabilities in tree pruning. In Machine Learning, EWSL-91. Berlin: Springer-Verlag.
Google Scholar
Cochran, W. G. (1952). Some methods of strengthening the common X2 tests. Biometrics, 10, 417-451.
Google Scholar
Elder, J. F. (1995). Heuristic search for model structure. In Fisher, D. & Lenz, H-J. (Eds.) Learning from Data: Artificial Intelligence and Statistics V, Lecture Notes in Statistics, v. 112 (pp. 131-142). New York: Springer.
Google Scholar
Fayyad, U. M., & Irani, K. B. (1992a). The attribute selection problem in decision tree generation. Proceedings of the 10th National Conference on Artificial Intelligence, AAAI-92 (pp. 104–110). Cambridge, MA: MIT Press.
Google Scholar
Fayyad, U. M., & Irani, K. B. (1992b). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87-102.
Google Scholar
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93) (pp. 1022-1027). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139-172.
Google Scholar
Fisher, D. H. (1992). Pessimistic and optimistic induction. Technical Report CS-92-22, Department of Computer Science, Vanderbilt University, Nashville, TN.
Google Scholar
Fisher, D. H., & Schlimmer, J. C. (1988). Concept simplification and prediction accuracy. Proceedings of the 5th International Conference on Machine Learning (ML-88) (pp. 22–28). San Mateo, CA: Morgan-Kaufmann.
Google Scholar
Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. Machine Learning: Proceedings of the 12th International Conference (ML-95) (pp. 244-251). San Francisco: Morgan Kaufmann.
Google Scholar
Gluck, M. A., & Corter, J. E. (1985). Information, uncertainty, and the utility of categories. Proceedings of the 7th Annual Conference of the Cognitive Society (pp. 283–287). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
John, G. H. (1995). Robust linear discriminant trees. In Fisher, D. & Lenz, H-J. (Eds.) Learning from Data: Artificial Intelligence and Statistics V, Lecture Notes in Statistics, v. 112 (pp. 375-386). New York: Springer.
Google Scholar
Kira, K. & Rendell, L. A. (1992). A practical approach to feature selection. Machine Learning: Proceedings of the 9th International Conference (ML-92) (pp. 249-256). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. Proceedings of the European Conference on Machine Learning (ECML-94), (pp. 171-182). Berlin: Springer.
Google Scholar
Liu, W. Z., & White, A. P. (1994). The importance of attribute-selection measures in decision tree induction. Machine Learning, 15, 25–41.
Google Scholar
Lopez de Mantaras, R. (1991). A distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.
Google Scholar
Martin, J. K. (1995). An exact probability metric for decision tree splitting and stopping. Technical Report 95-16, Department of Information & Computer Science, University of California, Irvine, CA.
Google Scholar
Martin, J. K. & Hirschberg, D. S. (1996a). On the complexity of learning decision trees. Proceedings Fourth International Symposium on Artificial Intelligence and Mathematics, AI/MATH-96 (pp. 112-115). Fort Lauderdale, FL.
Martin, J. K. & Hirschberg, D. S. (1996b). Small sample statistics for classification error rates I: Error rate measurements. Technical Report 96-21, Department of Information & Computer Science, University of California, Irvine, CA.
Google Scholar
Martin, J. K. & Hirschberg, D. S. (1996c). Small sample statistics for classification error rates II: Confidence intervals and significance tests. Technical Report 96-22, Department of Information & Computer Science, University of California, Irvine, CA. Mingers, J. (1987). Expert systems — rule induction with statistical data. Journal of the Operational Research Society, 38, 39- 47.
Google Scholar
Mingers, J. (1989a). An empirical comparison of pruning measures for decision tree induction. Machine Learning, 4, 227–243.
Google Scholar
Mingers, J. (1989b). An empirical comparison of selection measures for decision tree induction. Machine Learning, 3, 319–342.
Google Scholar
Murphy, P. M., & Aha, D. W. (1995). UCI Repository of Machine Learning Databases. (machine-readable data depository). Department of Information & Computer Science, University of California, Irvine, CA.
Google Scholar
Murphy P. M. & Pazzani, M. J. (1991). ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees. Machine Learning: Proceedings of the 8th International Workshop (ML-91) (pp. 183-187). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Murthy, S., Kasif, S., Salzberg, S., & Beigel, R. (1993). OC-1: Randomized induction of oblique decision trees. Proceedings of the 11th National Conference on Artificial Intelligence (AAAI-93) (pp. 322-327). Menlo Park, CA: AAAI Press.
Google Scholar
Murthy, S. & Salzberg, S. (1995). Look ahead and pathology in decision tree induction. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95) (pp. 1025-1031). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Niblett, T. (1987). Constructing decision trees in noisy domains. In Progress in Machine Learning, EWSL-87. Wilmslow: Sigma Press.
Google Scholar
Niblett, T., & Bratko, I. (1986). Learning decision rules in noisy domains. In Proceedings of Expert Systems 86. Cambridge: Cambridge University Press.
Google Scholar
Park, Y. & Sklansky, J. (1990). Automated design of linear tree classifiers. Pattern Recognition, 23, 1393-1412.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Quinlan, J. R. (1988). Simplifying decision trees. In B. R. Gaines & J. H. Boose (Eds.). Knowledge Acquisition for Knowledge-Based Systems. San Diego: Academic Press.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. & Cameron-Jones, R.M. (1995). Oversearching and layered search in empirical learning. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95) (pp. 1019-1024). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Schaffer, C. (1993). Overfitting avoidance as bias. Machine Learning, 10, 153–178.
Google Scholar
Shavlik, J.W., Mooney, R. J., & Towell, G. G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6, 111-143.
Google Scholar
Weiss, S. M. & Indurkhya, N. (1991). Reduced complexity rule induction. Proceedings of the 12th International Joint Conference on Artificial Intelligence, IJCAI-91 (pp. 678–684). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Weiss, S. M. & Indurkhya, N. (1994). Small sample decision tree pruning. Proceedings of the 11th International Conference on Machine Learning (ML-94), (pp. 335-342). San Francisco: Morgan-Kaufman.
Google Scholar
White, A. P. & Liu, W. Z. (1994). Bias in information-based measures in decision tree induction. Machine Learning, 15, 321–329.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, University of California, Irvine, Irvine, CA, 92692
J. Kent Martin

Authors

J. Kent Martin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin, J.K. An Exact Probability Metric for Decision Tree Splitting and Stopping. Machine Learning 28, 257–291 (1997). https://doi.org/10.1023/A:1007367629006

Download citation

Issue Date: August 1997
DOI: https://doi.org/10.1023/A:1007367629006

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Exact Probability Metric for Decision Tree Splitting and Stopping

Abstract

Article PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Regularized impurity reduction: accurate decision trees with complexity guarantees

BEST: a decision tree algorithm that handles missing values

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Exact Probability Metric for Decision Tree Splitting and Stopping

Abstract

Article PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Regularized impurity reduction: accurate decision trees with complexity guarantees

BEST: a decision tree algorithm that handles missing values

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation