Abstract
We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as a boolean linear threshold functions or a read-once DNF.
We show that for boolean linear threshold functions and read-once DNF, maximal purity gain and maximal influence are logically equivalent. This leads us to the exact identification of these classes of functions by impurity-based algorithms given sufficiently many noise-free examples. We show that the decision tree resulting from these algorithms has minimal size and height amongst all decision trees representing the function.
Based on the statistical query learning model, we introduce the noise-tolerant version of practical decision tree algorithms. We show that if the input examples have small classification noise and are uniformly distributed, then all our results for practical noise-free impurity-based algorithms also hold for their noise-tolerant version.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Angluin, D., Hellerstein, L., Karpinski, M.: Learning Read-Once Formulas with Queries. Journal of the ACM 40(1), 185–210 (1993)
Aslam, J.A., Decatur, S.E.: Specification and Simulation of Statistical Query Algorithms for Efficiency and Noice Tolerance. Journal of Computer and System Sciences 56(2), 191–208 (1998)
Blum, A., Furst, M., Jackson, J., Kearns, M., Mansour, Y., Rudich, S.: Weakly Learning DNF and Characterizing Statistical Query Learning Using Fourier Analysis. In: Proceedings of the 26th Annual ACM Symposium on the Theory of Computing, pp. 253–262 (1994)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group (1984)
Bshouty, N.H., Burroughs, L.: On the Proper Learning of Axis-Parallel Concepts. Journal of Machine Learning Research 4, 157–176 (2003)
Bshouty, N.H., Feldman, V.: On Using Extended Statistical Queries to Avoid Membership Queries. Journal of Machine Learning Research 2, 359–395 (2002)
Bshouty, N.H., Mossel, E., O’Donnel, R., Servedio, R.A.: Learning DNF from Random Walks. In: Proceedings of the 44th Annual Symposium on Foundations of Computer Science (2003)
Chernoff, H.: A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical Statistics 23, 493–509 (1952)
Cohen, E.: Learning Noisy Perceptron by a Perceptron in Polynomial Time. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, pp. 514–523 (1997)
Feige, U.: A Threshold of ln n for Approximating Set Cover. Journal of the ACM 45(4), 634–652 (1998)
Fiat, A., Pechyony, D.: Decision Trees: More Theoretical Justification for Practical Algorithms, Available at http://www.cs.tau.ac.il/~fiat/cart_justification_full.ps
Hancock, T., Jiang, T., Li, M., Tromp, J.: Lower bounds on Learning Decision Trees and Lists. Information and Computation 126(2), 114–122 (1996)
Haussler, D.: Quantifying Inductive Bias: AI Learning Algorithms and Valiant’s Learning Framework. Artificial Intelligence 36(2), 177–221 (1988)
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58, 13–30 (1963)
Hyafil, L., Rivest, R.L.: Constructing Optimal Binary Decision Trees is NPComplete. Information Processing Letters 5, 15–17 (1976)
Jackson, J., Servedio, R.A.: Learning Random Log-Depth Decision Trees under the Uniform Distribution. In: Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 610–624 (2003)
Kalai, A., Servedio, R.A.: Boosting in the Presence of Noise. In: Proceedings of the 35th Annual Symposium on the Theory of Computing, pp. 195–205 (2003)
Kearns, M.J.: Efficient Noise-Tolerant Learning from Statistical Queries. Journal of the ACM 45(6), 983–1006 (1998)
Kearns, M.J., Mansour, Y.: On the Boosting Ability of Top-Down Decision Tree Learning Algorithms. Journal of Computer and Systems Sciences 58(1), 109–128 (1999)
Kearns, M.J., Valiant, L.G.: Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. Journal of the ACM 41(1), 67–95 (1994)
Kushilevitz, E., Mansour, Y.: Learning Decision Trees using the Fourier Spectrum. SIAM Journal on Computing 22(6), 1331–1348 (1993)
Mansour, Y., Schain, M.: Learning with Maximum-Entropy Distributions. Machine Learning 45(2), 123–145 (2001)
Moshkov, M.: Approximate Algorithm for Minimization of Decision Tree Depth. In: Proceedings of 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, pp. 611–614 (2003)
Murthy, S.K.: Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Mining and Knowledge Discovery 2(4), 345–389 (1998)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Takimoto, E., Maruoka, A.: Top-Down Decision Tree Learning as Information Based Boosting. Theoretical Computer Science 292, 447–464 (2003)
Valiant, L.G.: A Theory of the Learnable. Communications of the ACM 27(11), 1134–1142 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fiat, A., Pechyony, D. (2004). Decision Trees: More Theoretical Justification for Practical Algorithms. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-30215-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23356-5
Online ISBN: 978-3-540-30215-5
eBook Packages: Springer Book Archive